Multilingual RAG · RLHF · Agentic RAG · Powered by Groq Free API
Ask questions in your language — the shared embedding space maps cross-lingual queries to documents natively. No translation step needed.
paraphrase-multilingual-MiniLM-L12-v2 — 471 MB, CPU-only. Maps all 50+ languages into a single shared vector space so Hindi queries match English docs seamlessly.
mmarco-mMiniLMv2-L12-H384-v1 — cross-encoder trained on MS MARCO in 13 languages. Scores query–passage relevance across language boundaries.
langdetect identifies your query language instantly. A language badge appears in the UI and the LLM is instructed to respond in the same language.
Voice input auto-detects spoken language — no language selector needed. Speak Hindi, French, Spanish or any of 99 supported languages and Whisper transcribes it perfectly.
Highlighted steps are multilingual-aware
PPO-style reward model adapts the chatbot's behaviour in real-time from your 👍/👎 feedback — no GPU, no retraining.
Ask in Hindi, French, German, Spanish, Japanese or 46 more languages. Shared embedding space — no translation step. Auto language badge in the UI.
Upload PDF / DOCX / TXT. 6-layer pipeline: semantic chunks → multilingual BM25+FAISS → RRF fusion → mMiniLM rerank → citations.
ReAct loop with multi-hop retrieval (up to 3 hops). LLM reasons, picks tools (search / summarise / verify), self-corrects, and gives a grounded final answer.
PPO-style reward model adapts temperature in real-time from 👍/👎 feedback. Export replay buffer as JSONL for offline fine-tuning with HuggingFace TRL.
Conversational AI with 20-turn sliding memory. RLHF policy adapts the generation style to your preferences over the session.
Click 🎙️, speak, stop — auto-transcribed by Whisper Large v3 Turbo (Groq). Supports 99 languages with automatic language detection.
6 specialised personas: Medical · Legal · Finance · Code · Fitness · Research. Domain-tuned system prompts for accurate expert-level answers.
Llama Guard 3 20B checks every response for unsafe content before it reaches you. Toggle on/off in the sidebar.
| Model | Type | Context | Best For |
|---|---|---|---|
| Llama 4 Scout 17B NEW | Chat | 128K | Latest Meta — fast & accurate |
| Qwen 3 32B NEW | Chat | 32K | Multilingual tasks |
| Llama 3.3 70B | Chat | 128K | Best overall quality (default) |
| Llama 3.1 8B | Chat | 128K | Speed-critical tasks |
| Mixtral 8×7B | Chat | 32K | Long documents |
| Whisper Large v3 Turbo | Audio | — | Speech → Text · 99 languages · auto-detect |
| Llama Guard 3 20B | Safety | — | Content safety filter |
| multilingual-MiniLM-L12-v2 NEW | Embeddings | — | 50+ language RAG embeddings (local) |
| mmarco-mMiniLMv2-L12 NEW | Reranker | — | Cross-lingual reranking · 13 languages (local) |
git clone https://github.com/rajneeshbabu/penguin-ai.gitcd penguin-ai
pip install -r requirements.txt
Includes langdetect for auto language detection.
Free key at console.groq.comecho "GROQ_API_KEY=gsk_..." > .env
streamlit run app.py
Opens at localhost:8501