Overview
Navigating the academic journey at Penn State is a complex challenge, with students spending countless hours cross-referencing course requirements between the cumbersome LionPath interface, RateMyProfessor reviews, dense University Bulletins, and recommended major pathways. This fragmented process is stressful and often leads to suboptimal course selections that can impact GPA and graduation timelines. Course Connect was engineered to solve this problem by providing a unified, intelligent platform for academic planning.
Developed for HackPSU Spring 2025, Course Connect streamlines course scheduling for Penn State students. By simply uploading a PDF transcript, the platform leverages Google's Gemini API to extract academic history, understand complex degree requirements, and generate personalized semester schedules and a comprehensive four-year plan. This AI-powered approach transforms a multi-hour research grind into an effortless, data-driven decision-making process, empowering students to build the most effective path to graduation.
My Role: Back-End Developer
As the back-end developer on this project, I was responsible for the core architecture and AI integration. My key contributions included building the Node.js RESTful API, integrating the Google Gemini API for schedule generation, and performing the prompt engineering required to get accurate results. I also developed the Python and Node.js scraping scripts and deployed the data acquisition pipeline on a scalable Kubernetes cluster.
Key Features
-
Transcript-Powered Planning: Users can instantly bootstrap their academic profile by uploading their official or unofficial transcript. The system automatically parses completed courses, credits, and grades to establish a baseline for all future recommendations.
-
AI-Generated Schedules: At its core, Course Connect uses the Google Gemini API to analyze a student's progress against scraped university data, generating optimal semester-by-semester schedules that satisfy prerequisites, major requirements, and general education credits.
-
Multi-Source Data Integration: The platform aggregates and synthesizes information from disparate sources, including the official Penn State course catalog, University Bulletins, and recommended major pathways, creating a single source of truth for academic planning.
-
Personalized Recommendations: The recommendation engine is designed to consider crucial factors beyond basic requirements, including course difficulty, professor quality metrics, and user-defined preferences like desired time of day or online/in-person class formats.
Technologies & Implementation
Course Connect is built on a modern, scalable stack designed for data-intensive AI operations. The backend is powered by Node.js, exposing a RESTful API for managing user data and serving recommendations. The core recommendation logic is handled by the Google Gemini API, which receives a curated context of the user's academic profile and scraped course data.
Data acquisition is managed by a suite of scraping scripts written in Python (using BeautifulSoup/Scrapy) and Node.js (using Puppeteer), which are containerized with Docker and orchestrated via Kubernetes to ensure consistent and reliable data collection from university sources. This data is stored in a relational PostgreSQL database, chosen for its robust querying capabilities and data integrity. The frontend is a clean, responsive web interface built with React.
graph TD
subgraph "Data Acquisition (Automated)"
A1["Scraping Scripts (Python/Node.js)"] --> A2["Penn State Bulletins"]
A1 --> A3["Course Scheduler"]
A1 --> A4["Major Pathways"]
end
subgraph "User Interaction"
B1["User Uploads Transcript"] --> C["React Frontend"]
C --> D["REST API (Node.js)"]
B2["User Sets Preferences"] --> C
end
subgraph "Backend Processing"
A1 --> E["PostgreSQL Database"]
D --> F{"Request Schedule"}
F -- "User Data & Preferences" --> G["Gemini API Prompt Engine"]
E -- "Scraped University Data" --> G
G -- "Generates Plan" --> F
F --> H["Return Optimized Schedule"]
end
H --> C
Challenges & Solutions
One of the primary technical hurdles was managing the context limitations of the Gemini API. Our initial approach of feeding all scraped university data into the model was inefficient and exceeded the context window. I solved this by implementing a data curation layer that pre-processes and selects only the most relevant information—such as the student's specific major requirements and prerequisite chains for potential courses—before sending it to the API. This significantly improved both performance and the accuracy of the recommendations.
Another major challenge was scraping and maintaining up-to-date information from various Penn State web sources, which are prone to structural changes and anti-scraping measures. To overcome this, I architected resilient, modular scraping scripts with comprehensive error handling and logging. I also designed a flexible database schema that could accommodate variations in data structure, ensuring the system could adapt to website changes without complete failure. This approach emphasized resilience and maintainability, which are critical for a data-dependent application.
Results & Impact
As a hackathon project, Course Connect successfully demonstrated a powerful proof-of-concept for AI-driven academic planning. We successfully designed the core architecture, developed a functional API for user management and recommendation retrieval, and integrated the Google Gemini API to power the core logic. The project laid a solid foundation for a tool with the potential to save thousands of students hours of manual research and prevent costly scheduling mistakes.
Our future plans involve a full feature implementation, including direct RateMyProfessor integration, visual calendar views for proposed schedules, and refining the recommendation algorithm based on user feedback. The ultimate vision is to create a scalable platform that could be adapted to support students at other universities, fundamentally changing how academic planning is done.
{TODO: Add screenshot of the Course Connect schedule recommendation interface}
