Machine Learning applied to biodiversity data

Description

Artificial intelligence, particularly the branch of machine learning, has grown rapidly in recent years, offering a wide range of techniques that can be used individually or together to analyze, extract patterns, and generate new data across countless application areas, which has led to significant advancements in these disciplines.

This course was developed in response to the training needs in different areas of data science identified by the Mesoamerican Data Science Network for Biodiversity Conservation (redbioma). It is aimed at professionals working in activities related to biodiversity conservation, so it is focused on problem-solving and developing knowledge and skills to design and implement simple machine learning models (using the Python programming language) applied to datasets relevant to the professional areas of the participants.


Schedule and start date
The course will be offered in two groups with the following schedules:
  • Group #1: Starts on Wednesday, July 10, and will be held every Wednesday from 3:00 PM to 5:00 PM (GMT-6) for 8 weeks.
  • Group #2: Starts on Thursday, July 11, and will be held every Thursday from 6:00 PM to 8:00 PM (GMT-6) for 8 weeks.


Course type
  • Modality: Virtual.
  • Theoretical/Practical: To complete the program, participants must attend more than 75% of virtual synchronous classes and achieve an average greater than or equal to 70 in evaluations.
  • Cost: Free.


Requirements
  • Availability of at least 16 hours during the entire program to attend eight virtual synchronous sessions. (2 hrs/class)
  • Availability of at least 24 hours during the entire the program for completing short assignments, labs, and a final project. (3 hrs/week)
  • Basic knowledge of Python programming, Numpy and Pandas libraries for data handling, and general knowledge of geospatial data representation. Preferably, participants should have completed the Introduction to Python for data science course offered as part of this training program.
  • Fill out the form to Participate in Redbioma activities (previously circulated, please fill out only once).


Registration form

Link: Registration for Machine Learning course


Objectives

General

Development of problem-solving skills for simple cases related to biodiversity conservation using machine learning techniques and understanding this branch of data science/artificial intelligence and its current challenges.


Specific

  • Build a solid understanding of basic concepts, machine learning development cycle, visualization techniques, and well as the role of data in the process, that will help students identify the machine learning skills required for their present and future professional development.
  • Identify the appropriate type of machine learning for each problem type.
  • Build simple models based on supervised, unsupervised, semi-supervised, and deep learning techniques relevant to biodiversity conservation.


Course methodology

The course methodology is based on active and collaborative learning, through problem-solving in labs, research work, and inverted classroom, among others techniques. The purpose is to guide students to strengthen their ability to research, use public datasets, critically analyze scientific articles, and apply new concepts based on previously acquired knowledge and course content.

The program is theoretical/practical, allowing participants to apply theoretical knowledge through case studies, group discussions, labs, and research projects.

Important:
  • All synchronous sessions will be recorded and published on the redbioma website.
  • Final research projects will be published on the redbioma website.


Program content

  1. Fundamentals of Machine Learning
    1. Brief historical overview (timeline, past and future).
    2. Basic definitions.
    3. Theoretical foundations. [1]
    4. Machine Learning Cycle: design, implementation, evaluation, result interpretation, and deployment. [2]
    5. Role of data: types of data, preparation.
    6. No/Low-code AI, Generative AI.
  2. Machine Learning Approaches
    1. Types of problems.
    2. Types of machine learning.
    3. Criteria for technique selection and result comparison.
    4. Considerations for dataset design based on the type of machine learning.
  3. Supervised Learning
    1. Description of supervised learning techniques.
    2. Building simple models based on supervised learning techniques relevant to biodiversity conservation.
  4. Unsupervised Learning
    1. Description of unsupervised learning techniques.
    2. Building simple models based on unsupervised learning techniques relevant to biodiversity conservation.
  5. Semi-Supervised Learning
    1. Description of semi-supervised learning techniques.
    2. Building simple models based on semi-supervised learning techniques relevant to biodiversity conservation.
  6. Deep Learning
    1. Description of deep learning techniques.
    2. Building simple models based on deep learning techniques relevant to biodiversity conservation. [3,4]
  7. Final Reflections
    1. Challenges and risks of machine learning. [5]
    2. Opportunities for application in biodiversity conservation.


Evaluation

Students will complete short assignments, labs, and a final project. Evaluation items are as follows:


Item Value (%)
Short assignments 30
Labs 40
Final project 30
Total 100

The final project involves identifying a machine learning use case relevant to the professional context of the participant.

Each evaluation will have a previously established due date. Submissions must be made by 11:45 PM (GMT-6). Late submissions will not be accepted. Google Classroom will be used for submissions.


Class schedule

Class Week
Fundamentals of Machine Learning 1
Machine Learning Approaches 2
Supervised Learning 3
Unsupervised Learning 4
Semi-Supervised Learning 5
Reinforcement Learning 5
Deep Learning 6-8
Final reflections 8
Project presentations 8


Materials

We recommend studying the following tutorials prior to the course:


References

  1. Greener, J. G., Kandathil, S. M., & Jones, D. T. (2022). A guide to machine learning for biologists.Nature Reviews Molecular Cell Biology, 23(1), 40–55. External link

  2. Guralnick, R. P., LaFrance, R., Allen, J. M., & Denslow, M. W. (2024). Ensemble automated approaches for producing high‐quality herbarium digital records. Applications in Plant Sciences, 13(1), e11623. External link

  3. Triki, A., Bouaziz, B., & Mahdi, W. (2022). A deep learning-based approach for detecting plant organs from digitized herbarium specimen images. Ecological Informatics, 69, 101590. External link

  4. Weaver, W. N., & Smith, S. A. (2023). From leaves to labels: Building modular machine learning networks for rapid herbarium specimen analysis with LeafMachine2. Applications in Plant Sciences, 11(5), e11548. External link

  5. Sworna, Z. T., Urzedo, D., Hoskins, A. J., & Robinson, C. J. (2024). The ethical implications of Chatbot developments for conservation expertise. AI and Ethics, 4, 917–926. External link


Contacts

Professors Email
Instructor: Emilia Zeledón Lostalo emilia.zeledon@gmail.com
María Auxiliadora Mora maria.mora@itcr.ac.cr