At a Glance
- Tasks: Build an AI system to extract data from multilingual receipts and invoices.
- Company: Join a forward-thinking tech company focused on innovative AI solutions.
- Benefits: Competitive pay, flexible hours, and the chance to work on exciting projects.
- Why this job: Make a real impact in AI while developing cutting-edge technology.
- Qualifications: Experience in OCR, Python, and machine learning required.
- Other info: Opportunity for growth in a dynamic, collaborative environment.
The predicted salary is between 36000 - 60000 Β£ per year.
We want you to build an on-premise AI-powered system that extracts structured data from multi-language receipts and invoices, with a strong focus on accurate line-item extraction. The goal is to create a production-ready document understanding pipeline that combines OCR, layout-aware machine learning, and deterministic validation logic. The final system should expose an API where we can upload documents and receive structured JSON data along with visual "control images" (OCR boxes, detected regions, token labels) to support a human feedback UI. Corrected feedback should be usable for continuous model retraining.
Scope of Work
- Image preprocessing (deskew, denoise, PDF rendering)
- OCR with bounding boxes (PaddleOCR preferred or equivalent on-prem solution)
- Layout / region detection (header, items area, totals area)
- Fine-tuned layout-aware transformer model (LayoutXLM or LayoutLMv3 preferred)
- Token classification for fields such as:
- Item description
- Quantity
- Unit price
- Line total
- Tax amount
- Grand total
The system must run fully on-premise and support multilingual documents (minimum: DE, FR, IT, EN). Multi-page documents must be handled correctly.
Preferred Tech Stack
- Python
- PyTorch
- HuggingFace Transformers
- LayoutXLM or LayoutLMv3
- PaddleOCR
- OpenCV
- FastAPI
- Docker
Experience with Document AI, OCR pipelines, and production ML systems is required.
Ideal Candidate
- Hands-on experience in OCR and document layout processing
- Experience fine-tuning LayoutLM / LayoutXLM or similar models
- Strong understanding of bounding boxes and token classification
- Experience building production-ready ML systems
- Clean, modular coding style
- Bonus:
- Experience with receipt or invoice parsing
- Experience with table reconstruction from OCR
- MLOps background
Please include:
- Relevant Document AI projects
- GitHub examples (if available)
- Short explanation of your approach
- Estimated timeline and rate
Contract duration of 1 to 3 months, with 40 hours per week.
Mandatory skills: OCR Algorithm, Python, Machine Learning, LayoutXLM, PaddleOCR
Project: On-Premise AI Receipt employer: FreelanceJobs
Contact Detail:
FreelanceJobs Recruiting Team
StudySmarter Expert Advice π€«
We think this is how you could land Project: On-Premise AI Receipt
β¨Tip Number 1
Network like a pro! Reach out to folks in the industry, attend meetups, and connect on LinkedIn. You never know who might have the inside scoop on job openings or can refer you directly.
β¨Tip Number 2
Show off your skills! Create a portfolio showcasing your relevant projects, especially those involving OCR and document processing. This will give potential employers a taste of what you can do.
β¨Tip Number 3
Prepare for interviews by brushing up on your technical knowledge and problem-solving skills. Be ready to discuss your experience with Python, machine learning, and any specific tools mentioned in the job description.
β¨Tip Number 4
Donβt forget to apply through our website! Itβs the best way to ensure your application gets seen. Plus, we love seeing candidates who take that extra step to engage with us directly.
We think you need these skills to ace Project: On-Premise AI Receipt
Some tips for your application π«‘
Show Off Your Skills: Make sure to highlight your hands-on experience with OCR and document layout processing. We want to see how you've tackled similar projects in the past, so donβt hold back on those relevant Document AI projects!
Be Clear and Concise: When explaining your approach, keep it straightforward. We appreciate clarity, so break down your thought process and make it easy for us to understand how you plan to tackle the project.
Include Your GitHub Examples: If you've got any GitHub examples, definitely include them! This gives us a peek into your coding style and the quality of your work, which is super important for us.
Apply Through Our Website: Donβt forget to apply through our website! Itβs the best way for us to keep track of your application and ensure it gets the attention it deserves.
How to prepare for a job interview at FreelanceJobs
β¨Know Your Tech Stack
Make sure youβre well-versed in the preferred tech stack mentioned in the job description. Brush up on Python, PyTorch, and the specific models like LayoutXLM or PaddleOCR. Being able to discuss your hands-on experience with these technologies will show that you're not just familiar but also capable of delivering results.
β¨Showcase Relevant Projects
Prepare to talk about your previous Document AI projects. If you have GitHub examples, make them easily accessible. Highlight how your past work aligns with the requirements of building an on-premise AI system for receipt extraction. This will demonstrate your practical experience and problem-solving skills.
β¨Understand the Pipeline
Familiarise yourself with the entire document understanding pipeline, from image preprocessing to generating visual debug overlays. Be ready to discuss how you would approach each step, especially focusing on OCR and layout detection. This shows that you can think critically about the projectβs needs.
β¨Prepare for Technical Questions
Expect technical questions related to OCR algorithms, bounding boxes, and token classification. Practice explaining complex concepts in a simple way, as this will help you communicate effectively during the interview. Being clear and concise will make a great impression!