Aspiring Data Scientist / ML Engineer / AI Researcher

I'm Simone Filosofi

"I try to make data useful. Sometimes the data cooperates."

About Me

Who I Am

I’m a Data Science MSc student at LUISS Rome by day and a debugger of mysterious model behavior by night. I call it research, my laptop calls it a cry for help.

I hold a BSc in Computer Science & Management. My thesis, where I applied NLP to the GenAI job market, was a fascinating dive into the future of work and mostly confirmed my suspicion: the machines are just as confused about the '5 years of experience in a 2-year-old technology' requirement as we are.

I also survived a Maymester at USC studying Probability Theory, which mostly taught me that unlikely things happen all the time.

Technical Skills

Languages
Python SQL R
ML & AI
Machine Learning Scikit-learn XGBoost NLP HuggingFace Transformers Algorithms
LLMs & RAG
LangChain LlamaIndex pgvector
Data
Pandas NumPy SMOTE Jupyter Web Scraping
Databases
MySQL MongoDB Supabase
Dev & Infra
FastAPI React Git Docker CI/CD
Analytics
Power BI NetworkX Data Visualization

Resume

Education

Sep 2025 — Ongoing
MSc Data Science
LUISS University Rome
Advanced Statistics, Machine Learning, Data Science, Data Privacy & Security: pursuing the mathematical foundations of AI at LUISS while concurrently driving technical initiatives for the Google Developers Club.
Sep 2022 — Jun 2025
BSc Computer Science & Management
LUISS University Rome
Thesis: "Decoding the GenAI Workforce: an NLP and ML analysis of evolving U.S. labor market demands, featuring a Healthcare deep dive."
110 cum laude
June 2024
Math 407 — Probability Theory
University of Southern California
Exchange Maymester coursework in advanced probability theory and statistical inference.

Research & Experience

February 2026 — Ongoing
Researcher and IT member
Google Developers Club
Developing practical campus solutions, including a browser-based Apple Wallet integration for student badges adopted by the LUISS community. Additionally, I am architecting internal RAG models to streamline knowledge management and operations for the Google Developers Group.
Apr 2025 — Sep 2025
AI Tutor
Make4Work — Rome, Italy
Delivered two full editions of a specialised AI course for schoolteachers, covering foundations through to practical classroom applications. Designed hands-on activities, facilitated live discussions and helped educators move from "AI sounds scary" to "AI is a tool I can actually use" — in six months flat.
Oct 2024 — Dec 2024
Data Analytics Intern
Procter & Gamble
Conducted exploratory data analysis on sales and marketing datasets to identify trends and opportunities. Developed predictive models to support inventory management and pricing strategies. Utilized Python and SQL for data manipulation and visualization.
May 2023 — Mar 2024
Pre-seed AI Engineer
FireGen AI
Partnered with the founder to conceptualize and build the initial MVP from the ground up. I focused on high-level reasoning frameworks and retrieval logic for the core RAG system, while implementing primary API integrations to deliver functional AI solutions for early-stage enterprise testing.

Projects

LUISS-badge wallet integration preview
A tool thought for students, to turn your LUISS badge QR code into an Apple Wallet pass — so you can tap in from your lock screen without opening an app, logging in or resetting your password in the rain at 8am. Upload your QR code, add your name, download the pass. That's it.
Javascript HTML CSS
RAGnarok preview
Free, multi-user RAG app for PDF Q&A with streamed, citation-backed responses. Features semantic search via HNSW indexing, complete user isolation through JWT + Row-Level Security, and a BYOK Groq API integration — zero-cost hosting, production-grade security.
Python React FastAPI Supabase pgvector Groq HuggingFace
Decoding GenAI Workforce preview
BSc thesis. Empirical analysis of 2,726 U.S. job postings (2023–2025) tracking how generative AI reshaped labor demand. Combines TF-IDF + fuzzy matching + SVM for job title standardisation (97% accuracy), LDA topic modelling across 9 clusters, and temporal trend analysis — from "uses ChatGPT" to "builds proprietary GenAI".
Python NLP scikit-learn gensim spaCy Plotly
Brain preview
Minimal CLI note-taking tool backed by SQLite. Add, search, edit, and delete notes from the terminal — with Rich formatting because plain text is boring. Built because my actual brain is busy forgetting things.
Python SQLite Typer Rich
STLA & WMT Financial Analysis preview
Comparative financial deep-dive on Stellantis and Walmart. Covers 5-year balance sheet trends, CAPM beta estimation, stock and bond valuation (DDM + comparables), capital structure analysis, and portfolio risk-return optimisation — using historical Yahoo Finance data.
Python Jupyter Pandas CAPM
Job Market Analytics preview
Dual SQL/NoSQL analytics platform over a job listings dataset. MySQL handles relational queries — gender wage gaps, salary by degree, company presence by portal. MongoDB handles document queries — top skills by company, executive analysis by sector. Same problem, two paradigms.
Python MySQL MongoDB PyMongo
Accenture Automotive preview
Consulting case study on the viability of a Guaranteed Used Vehicle program for a luxury automotive brand. Analyses depreciation patterns across GT, SUV, and EV segments using ~2,400 French used car listings. Key finding: the program isn't profitable at a 5% margin without repricing.
Python Jupyter Pandas Seaborn
Social Network Analysis preview
Graph analysis of character interactions in Forrest Gump. Implements betweenness, closeness, and decay centrality from scratch, custom PageRank, three community detection algorithms, and link prediction (Jaccard, Adamic-Adar, Resource Allocation). Visualised in Gephi.
Python NetworkX Gephi Graph Theory
Customer Satisfaction Classifier preview
ML pipeline predicting passenger satisfaction for ThomasTrain without direct feedback. Compares Logistic Regression, Decision Trees, and Random Forest with full hyperparameter tuning. Top finding: boarding experience and travel purpose (leisure vs. business) are the strongest predictors.
Python scikit-learn Random Forest Jupyter
Full ML pipeline to predict customer churn in the US telecom industry. Covers linear models with AIC/BIC stepwise selection, penalised regression (Ridge, Lasso, Elastic Net), non-linear methods (k-NN, GAM, tree ensembles) and clustering (K-Means + Hierarchical) for customer segmentation — all in R.
R Machine Learning Clustering R Markdown