(pronounced: ai·ris | aɪ.rɪs)

"AI That Opens Eyes"

Chapter I   The Vision

A New Dimension of Awareness

AIris is not merely a tool; it is a paradigm shift in assistive technology for the visually impaired. Our mission is to deliver instantaneous, contextual awareness of the visual world, empowering users with an unprecedented level of freedom and independence. Where other tools offer a glimpse, AIris delivers sight.

Development Team

Rajin Khan (2212708042) & Saumik Saha Kabbya (2211204042)
North South University | CSE 499A/B Senior Capstone Project

Chapter II   The Challenge

Bridging the Visual Gap

Current assistive technologies are a compromise—slow, costly, and tethered to the cloud. They offer fragmented data, not holistic understanding. We identified four critical failures to overcome.

High Latency

5+ second delays and complex interactions break immersion and utility.

Cost Barriers

Proprietary hardware and expensive cloud APIs limit accessibility.

Cloud Dependency

No internet means no functionality, creating a fragile reliance on connectivity.

Context Gap

Static image analysis fails to understand user intent or the dynamics of an environment.

Chapter III   The Solution

The AIris Solution

An elegant, purpose-built wearable that delivers sub-2-second, offline-first, context-aware descriptions. It is a quiet companion, a real-time narrator, and a bridge to visual freedom.

Instant Analysis

Sub-2-second response from a single button press to audio description. No apps, no menus, just instant awareness.

Edge AI Processing

Local-first approach on a Raspberry Pi 5 ensures privacy, low latency, and functionality without an internet connection.

Safety Prioritized

The AI engine is trained to identify and announce potential hazards—like obstacles, traffic, and steps—first.

Human-First Design

A lightweight, comfortable, and discreet form factor designed for all-day wear, with private audio delivery.

Chapter IV   Literature Review

Grounding Our Vision in Research

The AIris project is built upon a solid foundation of academic and applied research. Our review of existing literature validates our architectural choices and highlights our key contributions to the field of assistive technology.

Key Research Gaps Addressed

Research Gap IdentifiedHow AIris Addresses the Gap
High Latency & Cloud DependencyAn offline-first architecture on a Raspberry Pi 5 ensures sub-2-second response times, eliminating reliance on internet connectivity.
Lack of Contextual UnderstandingIntegration of modern Vision-Language Models (LLaVA, BLIP-2) provides rich, human-like descriptions, moving beyond simple object lists.
High Cost & Poor AccessibilityA targeted hardware budget under $160 USD and an open-source philosophy make the technology vastly more accessible than commercial alternatives.
On-Device Performance LimitationsTargeted hardware/software co-design, including model quantization and memory management, is a core development phase, not an afterthought.

References

  1. Naayini, P., et al. (2025). AI-Powered Assistive Technologies for Visual Impairment.
  2. (Foundational Work) Wang, L., & Wong, A. (2019). Enabling Computer Vision Driven Assistive Devices...
  3. (Foundational Work) Elmannai, W., & Elleithy, K. (2017). Sensor-Based Assistive Devices for Visually-Impaired People...
  4. Liu, H., et al. (2023). Visual Instruction Tuning (LLaVA).
  5. Li, J., et al. (2023). BLIP-2: Bootstrapping Language-Image Pre-training...
Chapter V   System Architecture

Anatomy of Instant Vision

Our modular architecture separates the system into a wearable Spectacle Unit and a powerful Pocket Unit. This core design is flexible, allowing for multiple physical form factors.

Spectacle Unit

USB Camera
Mini Speaker

Pocket Unit

Raspberry Pi 5
Power Bank
Tactile Button
3D-Printed Case

Conceptual Form Factors

Chapter VI   Technology Deep Dive

Our Technology Stack

We are leveraging a state-of-the-art technology stack, chosen for performance on edge devices. This is not just a concept; it is an engineered system.

AI Model Evaluation

Benchmarking multiple vision-language models to find the optimal balance of speed, accuracy, and resource usage for local deployment.

  • LLaVA-v1.5: Primary for balanced local performance.
  • BLIP-2: Used as an accuracy benchmark.
  • Groq API: For high-speed cloud fallback.
  • Ollama: For flexible local LLM hosting.

Software Stack

Built on a robust Python foundation, utilizing industry-standard libraries for computer vision, AI, and hardware interfacing.

  • Python 3.11+ (Core Language)
  • PyTorch 2.0+ (AI Framework)
  • OpenCV (Computer Vision)
  • RPi.GPIO & picamera2 (Hardware Control)
Chapter VII   Prototyping & Evaluation

Current Development Status

We are in the active prototyping and testing phase, using a web interface to rapidly evaluate and optimize different multimodal AI models before hardware integration.

Chapter VIII   The Blueprint

Budget & Portability

Accessibility includes affordability. We've sourced components to keep the cost under our target for the Bangladesh market, without sacrificing the core mission of complete portability.

Component CategoryCost Range (BDT)Weight Est.
Core Computing (Pi 5, SD Card)৳10,600 - ৳12,600~200g
Portable Power (Power Bank, Cables)৳2,350 - ৳3,600~400g
Camera & Audio System৳1,980 - ৳3,470~150g
Control & Housing৳955 - ৳1,910~180g
TOTAL ESTIMATE (Target < ৳17,000)৳15,885 - ৳21,580~930g
Chapter IX   The Roadmap

Two Phases of Innovation

Phase 1: CSE 499A (Current)

Focus: Software Foundation & AI Integration. This phase involves deep research into lightweight vision-language models, benchmarking their performance on the Raspberry Pi 5, building the core scene description engine, and optimizing the entire software pipeline for latency and efficiency.

Phase 2: CSE 499B (Upcoming)

Focus: Hardware Integration & User Experience. This phase brings the project into the physical world. We will 3D model and print the custom enclosures, assemble the complete wearable system, and conduct extensive field testing with users to gather feedback and refine the final product.

Chapter X   Academic Alignment

Exceeding Course Outcomes

This project is meticulously designed to meet and exceed the learning outcomes for the CSE 499A/B Senior Capstone course.

Problem & Design: We identify a real-world engineering problem and design a complete, constrained hardware/software system to meet desired needs.

Modern Tools: We leverage a modern stack including Python, PyTorch, modern AI models, and embedded systems.

Constraint Validation: Our budget addresses economic factors; offline-first design addresses privacy, and the core function is safety-focused.

Defense & Documentation: This experience, along with our detailed documentation, fulfills all reporting and defense requirements.

AIris

Thank you.

Questions & Answers