(pronounced: ai·ris | aɪ.rɪs)
"AI That Opens Eyes"
A New Dimension of Awareness
AIris is not merely a tool; it is a paradigm shift in assistive technology for the visually impaired. Our mission is to deliver instantaneous, contextual awareness of the visual world, empowering users with an unprecedented level of freedom and independence. Where other tools offer a glimpse, AIris delivers sight.
Development Team
Rajin Khan (2212708042) & Saumik Saha Kabbya (2211204042)
North South University | CSE 499A/B Senior Capstone Project
Bridging the Visual Gap
Current assistive technologies are a compromise—slow, costly, and tethered to the cloud. They offer fragmented data, not holistic understanding. We identified four critical failures to overcome.
High Latency
5+ second delays and complex interactions break immersion and utility.
Cost Barriers
Proprietary hardware and expensive cloud APIs limit accessibility.
Cloud Dependency
No internet means no functionality, creating a fragile reliance on connectivity.
Context Gap
Static image analysis fails to understand user intent or the dynamics of an environment.
The AIris Solution
An elegant, purpose-built wearable that delivers sub-2-second, offline-first, context-aware descriptions. It is a quiet companion, a real-time narrator, and a bridge to visual freedom.
Instant Analysis
Sub-2-second response from a single button press to audio description. No apps, no menus, just instant awareness.
Edge AI Processing
Local-first approach on a Raspberry Pi 5 ensures privacy, low latency, and functionality without an internet connection.
Safety Prioritized
The AI engine is trained to identify and announce potential hazards—like obstacles, traffic, and steps—first.
Human-First Design
A lightweight, comfortable, and discreet form factor designed for all-day wear, with private audio delivery.
Grounding Our Vision in Research
The AIris project is built upon a solid foundation of academic and applied research. Our review of existing literature validates our architectural choices and highlights our key contributions to the field of assistive technology.
Key Research Gaps Addressed
Research Gap Identified | How AIris Addresses the Gap |
---|---|
High Latency & Cloud Dependency | An offline-first architecture on a Raspberry Pi 5 ensures sub-2-second response times, eliminating reliance on internet connectivity. |
Lack of Contextual Understanding | Integration of modern Vision-Language Models (LLaVA, BLIP-2) provides rich, human-like descriptions, moving beyond simple object lists. |
High Cost & Poor Accessibility | A targeted hardware budget under $160 USD and an open-source philosophy make the technology vastly more accessible than commercial alternatives. |
On-Device Performance Limitations | Targeted hardware/software co-design, including model quantization and memory management, is a core development phase, not an afterthought. |
References
- Naayini, P., et al. (2025). AI-Powered Assistive Technologies for Visual Impairment.
- (Foundational Work) Wang, L., & Wong, A. (2019). Enabling Computer Vision Driven Assistive Devices...
- (Foundational Work) Elmannai, W., & Elleithy, K. (2017). Sensor-Based Assistive Devices for Visually-Impaired People...
- Liu, H., et al. (2023). Visual Instruction Tuning (LLaVA).
- Li, J., et al. (2023). BLIP-2: Bootstrapping Language-Image Pre-training...
Anatomy of Instant Vision
Our modular architecture separates the system into a wearable Spectacle Unit and a powerful Pocket Unit. This core design is flexible, allowing for multiple physical form factors.
Spectacle Unit
Pocket Unit
Conceptual Form Factors


Our Technology Stack
We are leveraging a state-of-the-art technology stack, chosen for performance on edge devices. This is not just a concept; it is an engineered system.
AI Model Evaluation
Benchmarking multiple vision-language models to find the optimal balance of speed, accuracy, and resource usage for local deployment.
- LLaVA-v1.5: Primary for balanced local performance.
- BLIP-2: Used as an accuracy benchmark.
- Groq API: For high-speed cloud fallback.
- Ollama: For flexible local LLM hosting.
Software Stack
Built on a robust Python foundation, utilizing industry-standard libraries for computer vision, AI, and hardware interfacing.
- Python 3.11+ (Core Language)
- PyTorch 2.0+ (AI Framework)
- OpenCV (Computer Vision)
- RPi.GPIO & picamera2 (Hardware Control)
Current Development Status
We are in the active prototyping and testing phase, using a web interface to rapidly evaluate and optimize different multimodal AI models before hardware integration.


Budget & Portability
Accessibility includes affordability. We've sourced components to keep the cost under our target for the Bangladesh market, without sacrificing the core mission of complete portability.
Component Category | Cost Range (BDT) | Weight Est. |
---|---|---|
Core Computing (Pi 5, SD Card) | ৳10,600 - ৳12,600 | ~200g |
Portable Power (Power Bank, Cables) | ৳2,350 - ৳3,600 | ~400g |
Camera & Audio System | ৳1,980 - ৳3,470 | ~150g |
Control & Housing | ৳955 - ৳1,910 | ~180g |
TOTAL ESTIMATE (Target < ৳17,000) | ৳15,885 - ৳21,580 | ~930g |
Two Phases of Innovation
Focus: Software Foundation & AI Integration. This phase involves deep research into lightweight vision-language models, benchmarking their performance on the Raspberry Pi 5, building the core scene description engine, and optimizing the entire software pipeline for latency and efficiency.
Focus: Hardware Integration & User Experience. This phase brings the project into the physical world. We will 3D model and print the custom enclosures, assemble the complete wearable system, and conduct extensive field testing with users to gather feedback and refine the final product.
Exceeding Course Outcomes
This project is meticulously designed to meet and exceed the learning outcomes for the CSE 499A/B Senior Capstone course.
Problem & Design: We identify a real-world engineering problem and design a complete, constrained hardware/software system to meet desired needs.
Modern Tools: We leverage a modern stack including Python, PyTorch, modern AI models, and embedded systems.
Constraint Validation: Our budget addresses economic factors; offline-first design addresses privacy, and the core function is safety-focused.
Defense & Documentation: This experience, along with our detailed documentation, fulfills all reporting and defense requirements.
AIris
Thank you.