Initial commit - MVP project setup

Created by AI Dev Factory init-mvp-project.sh
2025-12-06 00:03:36 +00:00 · 2025-12-06 00:03:36 +00:00 · cf33cc08c1
commit cf33cc08c1
5 changed files with 559 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,31 @@
 # Dependencies
 node_modules/
 __pycache__/
 *.pyc
 *.pyo
 *.pyd
 .Python
 venv/
 .venv/
 env/
 .env
 # IDE
 .vscode/
 .idea/
 *.swp
 *.swo
 # OS
 .DS_Store
 Thumbs.db
 # Build
 dist/
 build/
 *.egg-info/
 # Test
 .coverage
 .pytest_cache/
 *.log
--- a/ARCHITECTURE.md
+++ b/ARCHITECTURE.md
@ -0,0 +1,268 @@
 # Image Description AI - System Architecture
 ## High-Level System Design
 The Image Description AI is a client-side single-page application (SPA) that enables users to upload images and receive AI-generated descriptions using the Minimax API. The architecture follows a modern, responsive web application pattern with clean separation of concerns.
 ```
 ┌─────────────────────────────────────────────────────────────┐
 │                     Client-Side Application                  │
 ├─────────────────────────────────────────────────────────────┤
 │  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐  │
 │  │   Upload    │  │   Preview   │  │   AI Description    │  │
 │  │  Component  │  │  Component  │  │     Display         │  │
 │  └─────────────┘  └─────────────┘  └─────────────────────┘  │
 ├─────────────────────────────────────────────────────────────┤
 │                   Application State                         │
 ├─────────────────────────────────────────────────────────────┤
 │  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐  │
 │  │    File     │  │   Image     │  │   API Response      │  │
 │  │ Validation  │  │ Processing  │  │     Handler         │  │
 │  └─────────────┘  └─────────────┘  └─────────────────────┘  │
 ├─────────────────────────────────────────────────────────────┤
 │                     Minimax API                             │
 │              https://api.minimax.io/v1/text/chatcompletion_v2  │
 └─────────────────────────────────────────────────────────────┘
 ```
 ## Technology Choices and Rationale
 ### Core Technologies
 - **HTML5**: Semantic markup for accessibility and modern web standards
 - **CSS3**: Modern styling with Flexbox/Grid for responsive layouts
 - **Vanilla JavaScript**: Lightweight, no framework overhead, fast loading
 - **File API**: Native browser API for file handling and validation
 ### UI Framework Decision: Vanilla JavaScript vs React
 **Chosen: Vanilla JavaScript**
 - **Rationale**: 
  - Single HTML file requirement simplifies deployment
  - Minimal bundle size improves load times
  - No build process needed
  - Direct DOM manipulation gives precise control
  - Sufficient for the application's complexity level
 ### Styling Approach
 - **CSS Grid & Flexbox**: Modern, flexible layouts
 - **CSS Custom Properties**: Maintainable theming
 - **Mobile-First Responsive Design**: Works across all device sizes
 - **CSS Animations**: Smooth transitions and loading states
 ## Database Schema
 **Not Applicable**: This is a client-side only application with no persistent data storage. All processing is transient and happens in memory.
 ## API Endpoints
 ### External API Integration
 **Endpoint**: `https://api.minimax.io/v1/text/chatcompletion_v2`
 - **Method**: POST
 - **Authentication**: Bearer token (API key)
 - **Content-Type**: application/json
 **Request Structure**:
 ```json
 {
  "model": "MiniMax-M2",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Please provide a detailed description of this image in English"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQ..."
          }
        }
      ]
    }
  ],
  "max_tokens": 500,
  "temperature": 0.7
 }
 ```
 **Response Structure**:
 ```json
 {
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "MiniMax-M2",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "This image shows a beautiful sunset over..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 32,
    "total_tokens": 47
  }
 }
 ```
 ## File Structure
 ```
 image-description-ai/
 ├── index.html                 # Main application file (self-contained)
 ├── assets/
 │   ├── styles/
 │   │   └── main.css          # (Optional separate CSS file)
 │   └── scripts/
 │       └── main.js           # (Optional separate JS file)
 ├── README.md                 # Project documentation
 └── docs/
    ├── ARCHITECTURE.md       # This file
    └── TASKS.md              # Implementation tasks
 ```
 ### Single File Implementation
 For production deployment as specified, the complete application exists in one HTML file:
 - `index.html` - Contains all HTML, CSS, and JavaScript inline
 - No external dependencies or build process required
 - Immediate browser deployment ready
 ## Component Interactions
 ### 1. File Upload Flow
 ```
 User Input → File Selection → Validation → Preview Display
     ↓            ↓              ↓            ↓
 Drag&Drop → File API → Size/Type Check → Image Preview
   Click    → Reader   → Convert to    → Update UI
             → Base64   → Base64        → State Update
 ```
 ### 2. AI Processing Flow
 ```
 Generate Click → API Request → Loading State → Response Handling
       ↓            ↓             ↓              ↓
  Validate     → Construct    → Show Spinner → Display Result
  Image       → Payload       → Disable UI   → Handle Errors
       ↓            ↓             ↓              ↓
  State Check → Send POST     → Timeout      → Success/Failure
               → Minimax API  → Management    → UI Update
 ```
 ### 3. Error Handling Flow
 ```
 Any Error → Error Handler → User Notification → Recovery Option
    ↓           ↓               ↓                  ↓
 Network   → Categorize    → Clear Message    → Reset/Retry
 API      → Error Type     → Visual Alert     → State Reset
 File     → Log Details    → Action Required  → User Guidance
 Validation → Store State  → UX Feedback      → Continue Flow
 ```
 ## Data Flow Architecture
 ### Application State Management
 ```javascript
 appState = {
  currentImage: {
    file: File | null,
    base64: string | null,
    preview: string | null,
    metadata: {
      name: string,
      size: number,
      type: string
    }
  },
  apiStatus: {
    isProcessing: boolean,
    lastError: string | null,
    requestId: string | null
  },
  ui: {
    dragOver: boolean,
    showPreview: boolean,
    showResults: boolean
  }
 }
 ```
 ### Event-Driven Architecture
 - **File Input Events**: Handle drag&drop, click-to-browse, file selection
 - **Validation Events**: File size, type, and format checking
 - **API Events**: Request initiation, response handling, error management
 - **UI Events**: Loading states, animations, user feedback
 ## Security Considerations
 ### Client-Side Security
 - **Input Validation**: Strict file type and size checking
 - **XSS Prevention**: Sanitized content display
 - **API Key Management**: Client-side exposure (note: production should use server-side proxy)
 - **HTTPS Only**: Secure transmission to Minimax API
 ### Production Recommendations
 1. **Server-Side API Proxy**: Move API calls to backend to hide API keys
 2. **Rate Limiting**: Prevent API abuse
 3. **File Scanning**: Server-side malware detection
 4. **Content Security Policy**: Additional XSS protection
 ## Performance Optimizations
 ### Client-Side Optimizations
 - **Lazy Loading**: Load UI components on demand
 - **Debounced Validation**: Reduce unnecessary processing
 - **Memory Management**: Clean up base64 strings after use
 - **Progressive Enhancement**: Core functionality works without JavaScript
 ### API Optimizations
 - **Request Compression**: Minimize payload size
 - **Timeout Management**: Prevent hanging requests
 - **Retry Logic**: Handle transient network failures
 - **Caching**: Avoid duplicate API calls for same images
 ## Scalability Considerations
 ### Current Architecture Limits
 - **Client-Only Processing**: Limited by user's device capabilities
 - **File Size Constraints**: 5MB limit for practical performance
 - **API Rate Limits**: Dependent on Minimax service limits
 ### Future Enhancements
 - **Backend Integration**: Server-side processing and API management
 - **Batch Processing**: Multiple image handling
 - **User Accounts**: Save and manage image descriptions
 - **Advanced Features**: Multiple language support, custom prompts
 ## Browser Compatibility
 ### Supported Features
 - **File API**: Modern browsers (IE10+, all modern browsers)
 - **Base64 Encoding**: Universal browser support
 - **CSS Grid/Flexbox**: IE11+, all modern browsers
 - **Fetch API**: IE11+, all modern browsers (polyfill available)
 ### Fallback Strategies
 - **Older Browsers**: Graceful degradation with polyfills
 - **No JavaScript**: Basic form submission (limited functionality)
 - **Network Issues**: Offline mode with queued requests
 ## Deployment Architecture
 ### Static File Deployment
 - **CDN Ready**: Single HTML file suitable for any CDN
 - **Zero Dependencies**: No npm packages or build process
 - **Instant Deployment**: Upload and serve immediately
 - **Version Control**: Simple Git-based version management
 ### Environment Configuration
 - **Development**: Direct API calls with test keys
 - **Staging**: Mirror production with environment-specific settings  
 - **Production**: Server-side API proxy recommended for security
--- a/PROJECT_SPEC.md
+++ b/PROJECT_SPEC.md
@ -0,0 +1,43 @@
 Create a clean, modern web application that allows users to upload an image and get an AI-generated description using the Minimax API.
 **Requirements:**
 1. **Frontend Interface:**
   - Single page application with a clean, centered layout
   - File upload area (drag-and-drop support + click to browse)
   - Image preview after upload
   - "Generate Description" button
   - Loading state while processing
   - Display area for the AI-generated description
   - Option to upload a new image after getting results
 2. **Technical Implementation:**
   - Use vanilla JavaScript, HTML, and CSS (or React if you prefer)
   - Handle image file validation (accept common formats: jpg, png, webp)
   - Convert uploaded image to base64 for API submission
   - Make POST request to Minimax API endpoint
   - Handle API responses and errors gracefully
   - Display clear error messages if something goes wrong
 3. **Minimax API Integration:**
   - Endpoint: `https://api.minimax.io/v1/text/chatcompletion_v2`
   - Use model: `MiniMax-M2` 
   - Send image as base64 in the messages array
   - Prompt: "Please provide a detailed description of this image in English"
   - Handle API key securely (note: for production, this should be handled server-side)
 4. **UI/UX Details:**
   - Responsive design that works on mobile and desktop
   - Professional color scheme (suggest modern blues/grays)
   - Smooth transitions and loading animations
   - Clear visual feedback for all user actions
 5. **Error Handling:**
   - File size validation (max 5MB recommended)
   - File type validation
   - API error handling with user-friendly messages
   - Network error handling
 Please create a complete, production-ready single HTML file with inline CSS and JavaScript that I can immediately use in a browser.
--- a/TASKS.md
+++ b/TASKS.md
@ -0,0 +1,174 @@
 # Image Description AI - Implementation Tasks
 ## Task 1: Create Basic HTML Structure and Styling Foundation
 **Objective**: Establish the foundational HTML structure with semantic markup and responsive CSS framework
 **Deliverables**: 
 - Complete HTML skeleton with proper DOCTYPE and meta tags
 - Responsive CSS Grid layout system for the main application container
 - Modern color scheme implementation (blues/grays as specified)
 - Typography system with readable fonts and proper spacing
 - Mobile-first responsive breakpoints
 **Acceptance Criteria**:
 - [ ] HTML validates as HTML5 standard
 - [ ] Layout is responsive across mobile (320px+), tablet (768px+), and desktop (1024px+)
 - [ ] Color scheme uses professional blue/gray palette with proper contrast ratios
 - [ ] Typography is legible across all device sizes
 - [ ] Basic layout structure includes: header, main upload area, preview section, results section
 - [ ] CSS is modular with clear section organization (layout, components, utilities)
 ## Task 2: Implement File Upload Interface with Drag & Drop
 **Objective**: Create an intuitive file upload interface supporting both drag-and-drop and click-to-browse functionality
 **Deliverables**:
 - Drag-and-drop zone with visual feedback states (drag over, drop, default)
 - Hidden file input element for click-to-browse functionality
 - Visual upload area with icon, instructional text, and file format specifications
 - File type icon display for different image formats
 - Hover and focus states for accessibility
 **Acceptance Criteria**:
 - [ ] Drag-and-drop zone visually responds to drag events with proper styling
 - [ ] Click-to-browse opens file dialog and triggers file selection
 - [ ] Upload area shows clear instructions: "Drag an image here or click to browse"
 - [ ] Supported formats are displayed: JPG, PNG, WebP
 - [ ] Accessibility: Keyboard navigation and screen reader support
 - [ ] Visual feedback on hover/focus with smooth transitions
 ## Task 3: Add File Validation and Error Handling System
 **Objective**: Implement comprehensive file validation to ensure only valid images are processed
 **Deliverables**:
 - File type validation (accept only jpg, jpeg, png, webp)
 - File size validation (maximum 5MB with user-friendly error messages)
 - Validation feedback system with clear error messages
 - File metadata extraction (name, size, type)
 - Reset functionality to clear errors and start over
 **Acceptance Criteria**:
 - [ ] Invalid file types show error: "Please select a valid image file (JPG, PNG, WebP)"
 - [ ] Files over 5MB show error: "File size must be less than 5MB"
 - [ ] Error messages display in red text below upload area
 - [ ] Successful validation clears previous errors
 - [ ] Validation occurs immediately upon file selection
 - [ ] Files with valid extensions but invalid content are caught
 - [ ] Reset button clears all errors and upload area state
 ## Task 4: Implement Image Preview Functionality
 **Objective**: Display uploaded image with proper sizing and formatting for user confirmation
 **Deliverables**:
 - Image preview container with proper aspect ratio handling
 - Image resizing and optimization for preview (max 400px width/height)
 - Base64 encoding of the selected image for API submission
 - Metadata display (filename, file size, dimensions)
 - Replace/change image functionality
 **Acceptance Criteria**:
 - [ ] Image preview displays within 2 seconds of file selection
 - [ ] Preview maintains aspect ratio without distortion
 - [ ] Large images are scaled appropriately for preview
 - [ ] Base64 encoding completes successfully and is stored in memory
 - [ ] Image metadata is extracted and displayed (filename, size in MB, dimensions)
 - [ ] "Change Image" button allows uploading a different file
 - [ ] Preview clears when starting over
 ## Task 5: Integrate Minimax API for Image Description Generation
 **Objective**: Connect to Minimax API and implement the core AI description generation functionality
 **Deliverables**:
 - API request construction with proper payload format
 - Base64 image embedding in the request body
 - Proper error handling for network issues and API responses
 - Response parsing and extraction of the AI-generated description
 - API key management (client-side with security notes)
 **Acceptance Criteria**:
 - [ ] API request uses correct endpoint: `https://api.minimax.io/v1/text/chatcompletion_v2`
 - [ ] Request payload includes model: "MiniMax-M2"
 - [ ] Base64 image is properly formatted in the messages array
 - [ ] Prompt "Please provide a detailed description of this image in English" is included
 - [ ] Successful API response extracts the description text
 - [ ] API errors are handled gracefully with user-friendly messages
 - [ ] Network timeouts are handled with appropriate error messaging
 - [ ] Loading state is shown during API calls
 ## Task 6: Create Loading States and User Feedback System
 **Objective**: Implement comprehensive loading and feedback mechanisms to enhance user experience
 **Deliverables**:
 - Loading spinner and progress indicator during API calls
 - Status messages for different processing stages
 - Button state management (disabled during processing)
 - Timeout handling with user notification
 - Success and error state animations
 **Acceptance Criteria**:
 - [ ] Loading spinner appears immediately when "Generate Description" is clicked
 - [ ] "Generate Description" button is disabled during processing to prevent duplicate requests
 - [ ] Status message shows: "Analyzing image with AI..."
 - [ ] Processing timeout (30 seconds) shows: "Processing is taking longer than expected"
 - [ ] Success animation plays when description is generated
 - [ ] Error state shows appropriate error message with red styling
 - [ ] Loading states have smooth transitions and professional appearance
 ## Task 7: Display AI Description Results with Formatting
 **Objective**: Present the AI-generated description in a clean, readable format with additional functionality
 **Deliverables**:
 - Results display area with proper typography and spacing
 - Text formatting and paragraph handling for long descriptions
 - Option to copy description to clipboard
 - "Generate New Description" functionality for the same image
 - "Upload New Image" reset functionality
 - Responsive results layout
 **Acceptance Criteria**:
 - [ ] Description displays in a readable format with proper line breaks
 - [ ] Long descriptions are scrollable if they exceed viewport
 - [ ] "Copy to Clipboard" button works and shows confirmation
 - [ ] "Generate New Description" button triggers new API call
 - [ ] "Upload New Image" button clears all data and returns to upload state
 - [ ] Results area is visually distinct from upload area
 - [ ] Typography is large enough to read comfortably on mobile devices
 ## Task 8: Final Polish, Testing, and Production Readiness
 **Objective**: Complete final testing, optimizations, and prepare for immediate browser deployment
 **Deliverables**:
 - Comprehensive error handling for all edge cases
 - Performance optimization for large images and slow networks
 - Cross-browser compatibility testing
 - Accessibility improvements (ARIA labels, keyboard navigation)
 - Final code organization and documentation
 - Single HTML file consolidation with inline CSS and JavaScript
 **Acceptance Criteria**:
 - [ ] Application works in Chrome, Firefox, Safari, and Edge
 - [ ] All functionality works on mobile devices (iOS and Android)
 - [ ] Images up to 5MB process within 30 seconds on average connections
 - [ ] Clear error messages for all failure scenarios (network, API, validation)
 - [ ] Complete keyboard navigation support
 - [ ] ARIA labels and semantic HTML for screen readers
 - [ ] No console errors or warnings
 - [ ] Single HTML file contains all code and loads immediately in any modern browser
 - [ ] Professional appearance with smooth animations and transitions
 - [ ] File size under 50KB for fast loading
 ## Implementation Notes
 ### Task Dependencies
 - Tasks 1-2 can be implemented in parallel
 - Task 3 depends on Task 2 (validation needs upload interface)
 - Task 4 depends on Task 3 (preview needs validation)
 - Task 5 depends on Task 4 (API needs base64 image)
 - Task 6 depends on Task 5 (loading states during API calls)
 - Task 7 depends on Task 6 (results display after processing)
 - Task 8 depends on all previous tasks (final testing and polish)
 ### Technical Considerations
 - Use vanilla JavaScript ES6+ features for modern browser support
 - Implement CSS custom properties for maintainable theming
 - Follow progressive enhancement principles
 - Maintain separation of concerns within the single file
 - Include comprehensive error boundaries
 ### Testing Approach
 - Test with various image sizes and formats
 - Test error scenarios (large files, invalid types, network issues)
 - Verify responsive behavior across devices
 - Validate accessibility with screen readers
 - Performance test with slow network connections
--- a/prompt.md
+++ b/prompt.md
@ -0,0 +1,43 @@
 Create a clean, modern web application that allows users to upload an image and get an AI-generated description using the Minimax API.
 **Requirements:**
 1. **Frontend Interface:**
   - Single page application with a clean, centered layout
   - File upload area (drag-and-drop support + click to browse)
   - Image preview after upload
   - "Generate Description" button
   - Loading state while processing
   - Display area for the AI-generated description
   - Option to upload a new image after getting results
 2. **Technical Implementation:**
   - Use vanilla JavaScript, HTML, and CSS (or React if you prefer)
   - Handle image file validation (accept common formats: jpg, png, webp)
   - Convert uploaded image to base64 for API submission
   - Make POST request to Minimax API endpoint
   - Handle API responses and errors gracefully
   - Display clear error messages if something goes wrong
 3. **Minimax API Integration:**
   - Endpoint: `https://api.minimax.io/v1/text/chatcompletion_v2`
   - Use model: `MiniMax-M2` 
   - Send image as base64 in the messages array
   - Prompt: "Please provide a detailed description of this image in English"
   - Handle API key securely (note: for production, this should be handled server-side)
 4. **UI/UX Details:**
   - Responsive design that works on mobile and desktop
   - Professional color scheme (suggest modern blues/grays)
   - Smooth transitions and loading animations
   - Clear visual feedback for all user actions
 5. **Error Handling:**
   - File size validation (max 5MB recommended)
   - File type validation
   - API error handling with user-friendly messages
   - Network error handling
 Please create a complete, production-ready single HTML file with inline CSS and JavaScript that I can immediately use in a browser.