Data Science CLAUDE.md
CLAUDE.md template for data science projects with notebook conventions, data handling, and reproducibility.
Insert label
Data Science Prompt
Project Instructions
Notebooks
- Keep notebooks focused on one analysis or experiment.
- Clear all outputs before committing. Don’t commit cell outputs to git.
- Move reusable logic into .py modules. Notebooks are for exploration and presentation, not library code.
Data
- Never commit raw data to the repo. Document where to get it and how to set it up.
- Use relative paths for data files. Don’t hardcode absolute paths.
- Document data schemas and assumptions in comments or a data dictionary.
Reproducibility
- Pin all dependencies with exact versions.
- Set random seeds for any stochastic process.
- Document the steps to reproduce results from scratch.
Code Quality
- Type hints on functions. Docstrings on anything non-obvious.
- Use pandas and numpy idiomatically. Avoid row-by-row loops when vectorized operations work.
- Keep data transformations in named functions, not inline chains that span 20 lines.
Git
- Write descriptive commit messages: “add feature importance analysis” not “update notebook.”
- Use .gitignore for data files, model artifacts, and notebook checkpoints.
Use this claude.md template with Crystl.
Get Crystl