Star protocols：用于分析微生物学细胞图像的深度学习框架

Before you begin

Deep learning (DL) has proven to be extremely effective in addressing a range of major biological challenges, including predicting protein structure,⁴ DNA sequencing,⁵ and drug discovery.⁶ The application of DL has expanded into the microbiological field,⁷ particularly in cellular image analysis. In traditional cellular image analysis, there are several challenges that need to be addressed.

One challenge is that parasites have completely distinctive features in morphology during their complex life cycles,⁸ and the shape and size of the cells can vary considerably,⁹ making the classification and detection of different parasites and cells quite difficult. Additionally, obtaining high-quality and in-focus microscopic images can be challenging,⁷ due to various factors such as the diffraction barrier and defects in optical systems.¹⁰

DL-based cellular image analysis can solve these problems to some extent. However, the black-box nature of DL often leads to unexplainable results. Incorporating the knowledge and insights of experts into the modeling process can help to solve it, but most of the DL-based methods have not considered the importance of knowledge from microbiologists in cellular image analysis.¹¹^,¹² They are highly specialized and lack detailed instructions for most microbiologists. As a result, it can be challenging to develop accurate and easy-to-use DL models for cellular image analysis in microbiology.

To address these challenges, this protocol introduces a knowledge-integrated DL framework for cellular image analysis in microbiology. By building upon the previous studies of our group,¹^,²^,³ this protocol provides a comprehensive guide to implementing a wide spectrum of tasks (i.e., classification, detection, and reconstruction) in cellular image analysis. The following sections describe how the DL model integrates with human expert knowledge and provides step-by-step instructions accessible to both beginners and professionals.

Description of the methods

This protocol introduces three DL models integrated with knowledge from microbiologists, namely deep cycle transfer learning (DCTL),¹ geometric-feature spectrum ExtremeNet (GFS-ExtremeNet),²and correcting out-of-focus microscopic images (COMI).³ These models are designed for the classification, detection, and reconstruction tasks of cellular images in microbiology.

DCTL and COMI are both based on cycle generative adversarial networks (CycleGAN),¹³ as illustrated in Figure 1A. CycleGAN is comprised of two sets of generator-discriminator structures, which are different types of neural networks with distinct functionalities. Generators are used to transform the input images into different styles, while discriminators are used to identify whether the images are synthesized or not. Unlike traditional GANs, the cycle network topology does not require the one-to-one pairing of source images (Domain_X) and target images (Domain_Y), as in the case of DCTL.

In DCTL, X represents the morphologically similar macroscopic objects, while Y denotes the parasites to be recognized. The Generator_X→Y transforms the macroscopic images in DomainX into their corresponding parasite images, SyntheticY. Then, Generator_Y→X restores the images in Synthetic_Y back to the original macroscopic images, Restoration_X. Another cycle performs the same process in reverse. Finally, the discriminators are used to distinguish between the generated images and the original images, which are used to help the generators improve the quality of the generated images.

Building upon the backbone of CycleGAN, DCTL incorporates human expert knowledge through two supplementary feature extractors, as shown in Figure 1B. Using four groups of extreme points, it calculates the microscopic and macroscopic correlation (MMC)¹ to find the morphologically similar macroscopic objects of each parasite as a quantitative knowledge representation (Figure 2A). CycleGAN then learns the morphological information from these two image domains and teaches the supplementary feature extractors to identify different parasites. Each feature extractor is trained on both original images and synthetic images using a Cross-Entropy loss function.¹⁴ Once the model training is completed, the supplementary Feature Extractor_Y can be applied to classify the four types of parasites.

Key resources table

REAGENT or RESOURCE	SOURCE	IDENTIFIER
Software and algorithms
Anaconda	Anaconda v2.4.0	https://www.anaconda.com/
Spyder	Spyder v5.3.3	https://www.spyder-ide.org/
Python3	Python v3.7.16	https://www.python.org/
Tensorflow	Tensorflow v1.15.0	https://www.tensorflow.org/
Tensorboard	Tensorboard v1.15.0	https://pypi.org/project/tensorboard/
Tensorflow-estimator	Tensorflow-estimator v1.15.1	https://pypi.org/project/tensorflow-estimator/
Pytorch	Pytorch v1.2.0	https://pytorch.org/
Torchvision	Torchvision v0.4.0	https://pytorch.org/
Keras	Keras v2.2.4	https://keras.io/
Keras-contrib	Keras-contrib v2.0.8	https://github.com/keras-team/keras-contrib
H5py	H5py v2.10.0	https://www.h5py.org/
Scikit-learn	Scikit-learn v1.0.2	https://scikit-learn.org/stable/
Matplotlib	Matplotlib v3.5.3	https://matplotlib.org/
Scikit-image	Scikit-image v0.17.2	https://scikit-image.org/
Opencv-python	Opencv-python v4.6.0.66	https://pypi.org/project/opencv-python/
Pycocotools	Pycocotools v2.0.5	https://pypi.org/project/pycocotools/2.0.5/
Tqdm	Tqdm v4.64.1	https://tqdm.github.io/releases/
Pandas	Pandas v1.3.5	https://pandas.pydata.org/
Numpy	Numpy v1.21.6	https://numpy.org/
Protobuf	Protobuf v3.19.0	https://pypi.org/project/protobuf/
Tensorflow-gpu	Tensorflow-gpu v1.15.0	https://www.tensorflow.org/
cuDNN	cuDNN v7.6.5	https://developer.nvidia.com/cudnn
Cudatoolkit	Cudatoolkit v10.0.130	https://developer.nvidia.com/cuda-toolkit
Other
Codes and Datasets	Github	https://github.com/ruijunfeng/A-knowledge-integrated-deep-learning-framework-for-cellular-image-analysis-in-parasite-microbiology
Computing Platform: This protocol was performed on a computer with Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40 GHz Processor, two NVIDIA 2080Ti graphic cards, and 32G memory. Computer with more graphic cards is recommended to accelerate the training and evaluation.	Windows 10	https://www.microsoft.com/en-au/software-download/windows10