Skip to content
View tejakusireddy's full-sized avatar

Block or report tejakusireddy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
tejakusireddy/README.md

Teja Kusireddy

Backend / AI Infrastructure Engineer focused on distributed systems, reliability, observability, and LLM infrastructure.

M.S. Computer Science student at Pace University. Previously Software Engineer at Tata Consultancy Services. Currently building production-grade AI agent infrastructure at GetOnStack.

Portfolio · LinkedIn · Medium · Email


Open Source Contributions

Contributed 6 test-backed fixes across AWS CDK and Microsoft AI/cloud OSS, including Semantic Kernel, Agent Framework, ai4s-jobq, and Multicloud DB SDK.

Area Contribution Focus
AWS CDK CloudFormation token handling
Semantic Kernel Agent orchestration reliability
Agent Framework YAML / parser correctness
ai4s-jobq Distributed queue workflow reliability
Multicloud DB SDK Parser safety limits and cloud database error semantics

Focus areas: cloud infrastructure, AI agent orchestration, distributed workflows, parser correctness, and production reliability.


Featured Engineering Work

ForgeAI — AI Inference Gateway

OpenAI-compatible inference gateway for model routing, retrieval-augmented generation, and AI cost optimization.

Tech: Python, FastAPI, gRPC, vLLM, Qdrant, Redis, PostgreSQL, Kubernetes

Highlights

  • Built contextual-bandit routing for model selection
  • Added RAG support with vector retrieval
  • Added semantic caching and request optimization
  • Achieved 72.9% lower inference cost at 94ms p50 latency on NVIDIA H100

FalconQ — Distributed Messaging Queue

Distributed messaging queue built in Go with partitioned topics, consistent hashing, replication, and observability.

Tech: Go, AWS EKS, Raft, Terraform, Prometheus, Grafana

Highlights

  • Handles hot-key skew using partitioned topics and consistent hashing
  • Preserves write correctness during load balancing and failover
  • Validated at 1.5M messages/min with p99 latency under 12ms
  • Includes observability for broker and queue behavior

Copilot Plugin API — Copilot-Style LLM API Platform

C#/.NET API platform for LLM workflows with streaming, memory, plugin execution, and production-style request handling.

Tech: C#, .NET 8, Azure OpenAI, Redis, SQL, SSE, xUnit

Highlights

  • Built SSE streaming for real-time LLM responses
  • Added Redis-backed memory for workflow state
  • Built plugin execution support for tool-based workflows
  • Added rate limits, idempotency, semantic caching, and xUnit test coverage

OpenGPU Lab — CPU/CUDA/RTL GPU Execution Stack

End-to-end GPU systems lab with CPU, CUDA, and RTL backends, plus a GPU optimization CLI for roofline-guided kernel analysis.

Tech: C++, CUDA, Verilog, SystemVerilog, Verilator, Icarus Verilog, CMake, Python

Highlights

  • Built a unified C++ runtime across CPU, CUDA, and RTL backends
  • Achieved 0.0 max-error parity across backend outputs
  • Built gpuopt, a CLI for roofline modeling and memory-access analysis
  • Improved arithmetic intensity from 10.67 to 16.00 FLOPS/byte by adjusting tiling from 48 to 64 and adding shared-memory staging
  • Validated RTL behavior with clock/reset-safe simulation

Structured Extraction Pipeline — Legal Document Intelligence

Vector retrieval and structured extraction pipeline for semantic search across large legal document collections.

Tech: Python, FastAPI, PostgreSQL, Qdrant, OpenAI, Terraform, GCP

Highlights

  • Built semantic search over 10M+ legal documents
  • Added FastAPI service layer for retrieval and extraction
  • Added strict typing and CI/CD ingestion gates
  • Designed a retrieval-first architecture for reliable document intelligence

Trade Settlement Commentary Engine — Event-Driven Settlement Analysis

Event-driven trade settlement system with idempotent consumers, DLQ retries, and AI-assisted commentary generation.

Tech: Java 21, Spring Boot, Kafka, PostgreSQL, Redis, Claude API

Highlights

  • Built Kafka-based settlement processing workflow
  • Added idempotent consumers for repeat-safe event handling
  • Added DLQ retries for failure isolation and recovery
  • Reduced analysis time from ~20 min to under 10 sec

Experience Snapshot

Founding Engineer — GetOnStack
Building AI agent infrastructure focused on observability, cost control, loop detection, and production reliability.

Software Engineer — Tata Consultancy Services
Built backend services, cloud automation, observability systems, RAG workflows, and reliability tooling across production environments.

M.S. Computer Science — Pace University
Coursework: Distributed Systems, Cloud Infrastructure, Operating Systems, Machine Learning, Systems Security, Algorithms.

Certification
AWS Certified Solutions Architect – Associate


Technical Writing

Selected technical writing on AI infrastructure, agent reliability, and production LLM systems.

  • “We Spent $47,000 Running AI Agents in Production” — 92K+ views
  • Essays on AI runtime reliability, agent observability, LLM infrastructure, and production failure modes

Read on Medium


Tech Stack

Languages: Java, Python, Go, TypeScript, C#, C++, SQL, Bash
Backend: REST, gRPC, Kafka, Redis, Spring Boot, FastAPI, .NET, PostgreSQL
Cloud / Infra: AWS, Azure, GCP, Kubernetes, Docker, Terraform, Helm, GitHub Actions
Observability: Prometheus, OpenTelemetry, Grafana, Splunk
AI / LLM: Azure OpenAI, LangChain, vLLM, Qdrant, RAG, semantic caching
Testing: JUnit, PyTest, xUnit, Postman


Contact

Pinned Loading

  1. FalconQ-distributed-message-queue FalconQ-distributed-message-queue Public

    A fault-tolerant distributed message queue system inspired by Kafka, with priority messaging, replication, and real-time monitoring.

    Go 7 1

  2. opengpu-lab opengpu-lab Public

    End-to-end GPU systems lab: CUDA-backed runtime, warp scheduler simulator, LLVM-style optimizations, Verilog RTL accelerator (via Verilator), and profiler comparing CPU vs GPU vs RTL execution.

    C++ 1

  3. copilot-plugin-api copilot-plugin-api Public

    Forked from manasauppalapati/copilot-plugin-api

    A .NET 8 C# Web API integrating Azure OpenAI with Redis caching and Docker — Copilot-style LLM plugin infrastructure

    C#

  4. forgeai forgeai Public

    Adaptive AI inference platform — joint optimization of model selection, quantization precision, and retrieval strategy using a contextual bandit policy.

    Python

  5. structured-extraction-pipeline structured-extraction-pipeline Public

    Production-grade document intelligence engine: bulk ingestion, LLM-powered structured extraction, and citation graph analysis. Demonstrated on 10M+ court opinions.

    Python