The UNet architecture is a popular choice for image segmentation tasks. This implementation includes training and validation scripts, as well as utilities for data handling and model evaluation.
The unet arch is referenced from the origin unet paper. "U-Net:Convolutional Networks for Biomedical Image Segmentation". But there also exists a little difference.
- I use the same conv instead of valid-conv in the double-conv-block. This can ensure the output size same as the input size.
- More activation functions are explored here. Such as: relu, gelu, silu. I also define a new activation function named 'PySiLU' which mean the polynomial version of silu.
- Center-Crop is used in the skip connections between down block and up block.
This is a activation defined by myself. It has a function curve similar to SiLU.
The formula of this activation is as follows:
See the codes in activation.py. Uncomment the line # utils.show_activation_curve(pysilu). You can get this figure by run: python activation.py
- Python 3.6+
- PyTorch
- Albumentations
- tqdm
- NumPy
- matplotlib
- Pandas
- Clone the reposity:
git clone ~ cd Unet-From-Scratch
- Install dependecies:
pip install -r requirements.txt
- Download dataset from Kaggle.
- Prepare you train data and test data with
process_data.py. This will output two csv file under the data directory(train_desc.csv, test_desc.csv). But you can also use my files. - Make necessary directories.
mkdir imgs mkdir metric_logs
- Train you model. Check the parameters in
train.pyand then runpython train.py.
The formula of accuracy is as follows:
I assume that you all know this metric. So there is no more introduce about accuracy.
The formula of accuracy is as follows:
Here is a implementation of dice-score with pytorch.
def dice_score(pred, mask):
up = 2 * (pred * mask).sum()
down = (pred + mask).sum()
return up / downThe idea is:
pred * maskis the result of 'logical and' of pred and mask. By summing it, you get theTPvalue.- Give an example:
pred=Tensor([1.0, 1.0, 0.0, 1.0]), mask=Tensor([1.0, 0.0, 0.0, 1.0]). By excutepred + mask, you can get a resultTensor([2.0, 1.0, 0.0, 2.0]). If the value ofpred[i]andmask[i]are both1(that means this is a true positive case), you can get2by executingpred + mask. This is sames as2*TP. If the value ofpred[i]andmask[i]are both0(that means this is a true negative case), you can get0which will contribute nothing. But, If the value ofpred[i]andmask[i]is1or0(that means this is a false positive or false negative case), you can get1by executingpred + mask. This is same asFP + FN.
Run experiment 5 epochs with random seed 20.
| Activation | Train Dice Score | Eval Dice Score |
|---|---|---|
| ReLU | 0.9552 | 0.9295 |
| GeLU | 0.9559 | 0.9629 |
| SiLU | 0.9556 | 0.9601 |
| PySiLU | 0.9545 | 0.9602 |
The figure is dice score curve with different activation function in 5 epochs. One can see that the ReLU activation function appears to exhibit suboptimal performance.
Some inference result visualization.
@misc{AndrewGuan-UNet,
author = {Zhongchao, Guan},
title = {torch-unet-from-Scratch},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/AllenWrong/From-Scratch/unet}},
}
- This implementation is based on the UNet architecture.
- The dataset used for training and testing is downloaded from Kaggle Carvana Image Masking Challenge
If you are interested in my project or you want to know more about the from scratch series, follow me on github.
If you have some ideas youd like to bring to life, please email me.
- 📧Email me: [email protected]
- Follow me on LinkedIn.





