A digital image is just a grid of tiny dots called pixels. Each pixel has a color. In color images, each pixel has three values, Red, Green, and Blue. This is called RGB. In grayscale images, each pixel has just one value that shows how light or dark it is.

https://www.researchgate.net/figure/A-three-dimensional-RGB-matrix-Each-layer-of-the-matrix-is-a-two-dimensional-matrix-of_fig6_267210444
Think of it like a spreadsheet. Each cell is a pixel. Instead of numbers like sales or costs, the cells hold brightness or color info.
These are ways to change the shape or position of the image.
This is like folding or rotating a photo on your desk. It can also be used for data augmentation!
This means making your training set bigger by creating new versions of your images.
Ways to augment:

https://www.ibm.com/think/topics/data-augmentation
Why? It helps the model learn better and not overfit.