Week 7
Milestones
-  Experiments with CLIP- draw confusion matrices comparing the errors on CLIP and Tesseract classification - result: CLIP is better
 
-  Set up a git repository - shrivastava95/clip-ocr for fine-tuning CLIP onto a given dataset.- Created a dataset of cropped word images from some pages of en-or.pdf
- Implemented base zero-shot approach
- Implemented CoOp - https://arxiv.org/abs/2109.01134 as a cheaper alternative to finetuning CLIP
 
Screenshots / Videos
Contributions
Learnings
- Learnt about OpenAI's CLIP model, a zero-shot model for measuring semantic similarity between image and text pairs.
- This is done using cosine similarity of their projections onto a common embedding space.