Company Description
Invito Software Solutions is a leading software development company providing world-class web and mobile solutions efficiently and cost-effectively.
We specialize in next-generation design patterns, responsive coding techniques, and rigorous quality assurance, resulting in high-quality apps with a high return on investment.
Our scalable services and customized engagement models cater to businesses of all sizes, from innovative startups to well-established companies.
We develop powerful and effective solutions that meet our clients' specific needs.
Project: Spring-Powered Desktop App for Invoice OCR → Clean CSVWhat you’ll build
- Desktop Java application (Java 17+) backed by Spring Boot
- Runs locally (offline-first), ships as a single native installer or fat JAR.
- UI options (pick one):
- JavaFX UI using Spring for DI (FXML controllers wired via Spring).
- Swing UI with Spring-managed services.
- Embedded local web UI: Spring Boot serves a UI (e.g., Vaadin or lightweight HTML/TS), shown in an embedded browser (WebView/JxBrowser).
- Invoice ingestion: drag-and-drop folders/files; PDF, PNG/JPG; multipage; batch mode.
- AI/OCR pipeline (pluggable via Spring beans):
- Local OCR (Tesseract) + layout/zone analysis, or
- Cloud OCR (AWS Textract, Google Vision) with retry/backoff, or
- LLM-assisted parsing to a JSON schema with guardrails.
- Field extraction (headers + line items): vendor, invoice #, dates, currency, taxes, subtotals/totals, PO, line descriptions, qty, unit price, amounts.
- Validation & review UI: show the document preview, highlight extracted zones, flag low-confidence fields, quick edits, autocomplete.
- CSV export: stable schema; normalize number/date/locale; export per file or batch.
- Rules & heuristics: vendor templates, regex fallbacks, learned patterns, per-vendor overrides.
- Quality metrics: confidence per field, accuracy dashboards, reject reasons, simple analytics.
- Offline by default, with optional cloud connectors for OCR/LLM and template sync.
Architecture (Spring-centric)
- App launcher: desktop entry point that bootstraps SpringApplication.
- Core modules (Spring beans):
- IngestionService: drag-and-drop, PDF/image decoding, page splitting.
- OcrService (strategy): TesseractOcrService, TextractOcrService, GcvOcrService.
- ParsingService: layout analysis, key-value detection, tables; optional LlmParsingService.
- TemplateService: vendor profiles, regex rules, learned mappings; local cache + optional remote sync.
- ValidationService: confidence scoring, anomaly detection, suggestions.
- ExportService: CSV writer (stable schema, locale normalization).
- MetricsService: capture confidence, errors, durations; local storage (SQLite/H2).
- DocumentPreviewService: render + zone overlays (PDFBox + image layers).
- Persistence: Spring Data (SQLite/H2 on disk) for runs, templates, audits.
- Config: Spring profiles (offline, cloud), YAML config for OCR provider keys, thresholds, CSV schema.
- UI layer:
- If JavaFX: Controllers get beans via Spring (custom SpringFXMLLoader), reactive updates via ApplicationEventPublisher.
- If Swing: @Component panels wired to services, event bus for updates.
- If Vaadin (embedded web): served by Spring Boot; package app with an embedded browser window.
Key user flows
- Drop files/folders → Ingestion queue (progress bar, cancel/retry).
- OCR + Parsing → Field map + line items + confidence per field.
- Review screen → Document preview with highlight boxes, editable fields, low-confidence badges, autocomplete from templates.
- Approve/Reject → Approved go to export queue; rejected capture reasons.
- Export → Single CSV or per-invoice CSV; schema versioning; logs.
- Analytics → Success rate, average confidence, common reject reasons, vendor leaderboard.
CSV schema (stable, versioned)
- schemaVersion, fileId, vendorName, invoiceNumber, invoiceDateISO, dueDateISO, currency, subtotal, tax, total, poNumber, …
- Line items (denormalized or separate CSV): lineIndex, description, qty, unitPrice, amount.
- Locale-safe formatting (ISO dates, dot decimal).
Offline/Cloud strategy
- Offline: Tesseract + local templates; everything runs without internet.
- Cloud (optional): switch OCR/LLM beans via Spring profile/env; graceful fallback to offline if unavailable.
Packaging & ops
- Distribution: jpackage/native installers (Win/MSI, macOS/DMG, Linux/DEB/RPM) or fat JAR.
- Logging: Spring Boot logging; per-invoice audit trail; export logs.
- Updates: optional auto-update check (profile-gated).
Security & privacy
- Local processing by default; redact PII in logs; encrypted at-rest store (configurable).
- For cloud calls: minimal payloads, signed requests, regional endpoints.
Nice-to-haves
- Hot keys and batch review UX.
- Template “learn” button: convert manual fixes into a saved vendor rule.
- Import/export of templates.
- Headless CLI mode: --input dir --output csv.
Deliverables
- Source code (Spring Boot project + UI layer).
- Packaged desktop installer(s).
- Sample vendor templates & test invoices.
- README + setup, profiles, and OCR provider docs.
- Test suite (unit + a few end-to-end fixtures).
- Short user guide (drop → review → export).
Qualifications
- Strong proficiency in Java and Spring Boot (DI, Spring Data, configuration, profiles).
- Experience building desktop UIs in JavaFX, Swing, or Vaadin (served by Spring) with responsive, user-friendly design.
- Skilled in troubleshooting, debugging, and performance tuning in Spring/Java.
- Familiar with OCR/AI integrations (Tesseract, Textract, Vision, OpenAI/Vertex) and robust parsing.
- Version control with Git; excellent communication; ability to work independently/remote.
- Bachelor’s in CS/Engineering (or equivalent experience).
- Freelance/contract experience is a plus.