MLE / Data Science / AI for Science

Jie Hou Building ML systems for data and science.

M.S. Data Science student at UC San Diego bridging domain expertise with modern AI. Focused on machine learning engineering, data pipelines, and robust evaluation systems for complex real-world and scientific datasets.

UCSD M.S. Data Science, 2025-2027
8 Publications spanning energy, semiconductors, imaging, and spectroscopy
AI+Sci Applying AI to traditional science and engineering domains
About

Research-trained,
AI-focused, implementation-ready.

I am positioning my work around a practical question: how can modern machine learning help traditional engineering and scientific fields move faster, measure better, and make better decisions from complex data?

Operating Style

I like systems that can be defended: reproducible pipelines, honest validation, clear assumptions, and interfaces that let scientists, engineers, and product teams understand model behavior before relying on it.

Python PyTorch Pandas SQL Experimentation AWS
Capabilities

From scientific data to deployable ML systems.

I work across the modeling stack: data pipelines, statistical reasoning, ML evaluation, and domain-aware analysis for product and scientific datasets.

portfolio.signal.pipeline
> define_problem()
metric = "scientific decision quality"
constraints = ["noise", "bias", "domain shift"]

> build_pipeline()
steps = [
  "clean experimental data",
  "engineer physics-aware features",
  "train and validate models",
  "ship reproducible inference"
]

> ship_insight()
output = "model + uncertainty + domain context"
01

Machine learning engineering

Modeling pipelines, evaluation loops, feature stores, batch inference, and deployment-minded engineering for real-world data products.

02

Scientific data modeling

Feature engineering and model validation for noisy measurements, spectra, microscopy, diffraction, energy systems, and materials workflows.

03

Experimentation & causal thinking

Power analysis, KPI design, uncertainty-aware comparison, and careful claims when moving from measured effects to decisions.

04

Data product thinking

Interfaces for communicating model diagnostics, metric narratives, scientific context, and decision-ready summaries to mixed audiences.

Selected Projects

Applied work for MLE and Data Scientist roles.

Selected work across experimentation, signal modeling, and NLP highlights practical modeling, evaluation, and communication skills.

01

Experimentation & Causal Impact Analysis

Designed A/B test workflows, power analysis, and statistical evaluation patterns for deciding whether a product or modeling change is truly moving the target metric.

A/B Testing Power Analysis Decision Science
02

Scientific Signal Modeling Framework

Built a reusable mental model for turning noisy measurements into features, validating signal quality, and communicating uncertainty before a model is used downstream.

Signal Modeling Feature Engineering Decision Rules
03

User Intent Retrieval & NLP Safety Modeling

Built an end-to-end NLP pipeline fine-tuning DistilBERT for extreme class imbalance, improving rare-event detection F1 to 0.84 and reducing false positives in simulation.

DistilBERT Imbalanced Data Evaluation
Research & Publications

AI for Science starts with understanding science.

Selected publications connect my earlier work in materials, energy systems, semiconductors, spectroscopy, and computational imaging with my current focus on machine learning and scientific data systems.

Research Direction

From physics and engineering research to AI systems for materials, semiconductors, and energy.

A foundation in physical-science research informs a practical approach to AI: respect the measurement process, model uncertainty, and build tools that scale scientific decision-making.

Solid oxide fuel cell schematic and durability comparison
Energy Systems

A Solid Oxide Fuel Cell Runs on Hydrocarbon Fuels with Exceptional Durability and Power Output

Advanced Energy Materials, 2022. Work on durable energy devices, catalyst design, and performance measurement under real operating constraints.

SOFC Catalysts Energy Materials
Double perovskite crystal structure from Figure 4
Semiconductor & Materials Data

Powder X-ray structural analysis and bandgap measurements for double perovskites

NIST-linked materials measurement work demonstrating data fluency: analyzed complex X-ray diffraction patterns, bandgap measurements, and structure-property relationships to compile reproducible reference data.

XRD Bandgap Perovskites
MgO magnetic tunnel junction review composite with Figure 7 and Figure 18a
Semiconductor Devices

Electromagnetic Radiation Effects on MgO-Based Magnetic Tunnel Junctions: A Review

Co-authored review on semiconductor and spintronic devices, summarizing radiation environments, MgO tunnel barriers, and microstructural evidence for robust electronics.

MTJ MgO Barriers Spintronics
Crystal structure visualization for oxide materials
Crystal Chemistry

Crystal chemistry and phase equilibria of the CaO-Ho2O3-CoOz system

Investigated complex phase relationships and structured scientific datasets, laying the groundwork for incorporating thermodynamic constraints and domain knowledge into machine learning models.

Phase Equilibria Oxides Structure
Computational ghost imaging schematic with structured illumination
Computational Imaging

Improvements of computational ghost imaging using Special-Hadamard patterns

Developed computational imaging algorithms for signal recovery and image reconstruction, bridging classical optics with modern signal processing and machine learning workflows.

Imaging Algorithms Signal Recovery
Experience & Education

A technical path through science, data, and AI.

The timeline connects prior research training with current goals in machine learning engineering and data science.

2025 - 2027
UC San Diego logo

M.S. in Data Science

University of California San Diego

2020 - 2023
Georgia Tech logo

Graduate Research Assistant

Georgia Institute of Technology. Conducted research on experimental data analysis.

2019 - 2020
NIST logo

Research Intern

National Institute of Standards and Technology. Assisted with physical sciences research and data collection.

2015 - 2019
Delaware State University logo

B.S. in Physics

Delaware State University

Contact

Build AI systems that respect the science.

Open to conversations around machine learning engineering, data science, AI for Science, materials and energy data, experimentation, and research collaboration.