Transforming data into actionable insights through innovative solutions
Feb 2025 - Present
Dec 2022 – Feb 2025
June 2021 - Aug 2022
April 2018 - June 2021
Jan 2023 – May 2023
Designed a predictive analytics pipeline leveraging Random Forest, CatBoost, and HuggingFace BERT models to achieve 67% accuracy, enabling data-driven investment insights with optimized model deployment workflows. Conducted A/B testing to evaluate model performance with different feature sets, improving the prediction accuracy by 5%.
The Sorting Hat is analyzing your skills...
Aug 2022 - May 2024
Graduate Teaching Assistant: Supported 3 graduate-level courses; guided 100+ students in Machine Learning, Data Analysis, and Applied Statistics concepts and labs.
Demonstrating expertise in cloud-based data pipelines & real-time data solutions
Represented College in ACM International Collegiate Programming contest (ACM ICPC) Amritapuri Regional
Co-founded a web development firm (WebStories), developing bespoke websites for diverse clients
PyCon 2024, Pittsburgh: Documented the datetime64("now") feature in the NumPy library
"Shrey demonstrated exceptional expertise in developing NLP models and optimizing data processing pipelines. His work on biomedical document classification significantly improved our system's accuracy."
"Shrey's implementation of RAG pipelines with LLMs was outstanding. He consistently delivered high-quality solutions and showed strong problem-solving abilities."
# NLP Model for Document Classification
import torch
from transformers import AutoTokenizer, AutoModel
class DocumentClassifier:
def __init__(self, model_name='bert-base-uncased'):
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
self.model = AutoModel.from_pretrained(model_name)
def classify(self, text):
inputs = self.tokenizer(text, return_tensors='pt')
outputs = self.model(**inputs)
return outputs.last_hidden_state.mean(dim=1)
# Achieved 22% accuracy improvement
classifier = DocumentClassifier()
result = classifier.classify("Biomedical document text...")
-- Optimized SQL Query for Data Pipeline
WITH ranked_data AS (
SELECT
id,
category,
score,
ROW_NUMBER() OVER (
PARTITION BY category
ORDER BY score DESC
) as rank
FROM documents
WHERE processed_date >= DATEADD(day, -30, GETDATE())
)
SELECT
category,
AVG(score) as avg_score,
COUNT(*) as total_count
FROM ranked_data
WHERE rank <= 100
GROUP BY category
-- Reduced query time by 12%
# RAG Pipeline Implementation
import torch.nn as nn
from transformers import AutoModel
class RAGModel(nn.Module):
def __init__(self, encoder_name='sentence-transformers/all-MiniLM-L6-v2'):
super().__init__()
self.encoder = AutoModel.from_pretrained(encoder_name)
self.retriever = nn.Linear(384, 128)
self.generator = nn.Linear(128, 50257)
def forward(self, query, context):
query_emb = self.encoder(query).pooler_output
context_emb = self.encoder(context).pooler_output
retrieved = self.retriever(query_emb + context_emb)
return self.generator(retrieved)
# Reduced query resolution time by 16%
Feel free to reach out for opportunities or collaborations
(312) 934-6334
jaradishrey@gmail.com
Chicago, IL