Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm still not sold personally—it seems like the in-memory persistence is only useful for the intermediate case where my data is slow enough to generate/obtain that I don't want to run the code every time to do so, but fast enough that I don't mind running it every time I launch the editor. Most of the data I have that's worth caching due to speed is worth caching to disk. Combined with unpredictable side effects of variables persisting whilst I'm actively hacking on the code and implicit in-memory persistence is pretty off-putting.

A recent workflow I've had for a data analysis project is to have each stage of data processing in a separate function, with all the functions called in order from an " if __name__ == '__main__'" block, with all but the function I'm presently working on commented out. Each function returns nothing, but saves its data to an HDF5 file. Other functions read the inputs they need from the HDF5 file and write their outputs to the same file, and if I want a fresh run I just delete the file, uncomment everything in the '__main__' block and run again.

The functions also save output plots to subfolders.

This is compatible with version control, and caching on disk rather than just in memory.

The biggest downside compared to Jupyter notebooks is lack of interactivity in the saved plots (I can make interactive plots pop up of course but they're all in separate windows all at once so it's less clear which part of the code each plot came from), and lack of LaTeX in code comments - I still will have external LaTeX documents explaining what algorithm I'm using somewhere.

So for now, the downsides of notebooks with respect to version control, data caching and extra state that I have to remember in order to not hit subtle bugs in my code as I hack on it, seem to outweigh the upsides.

Maybe what I would like is an editor that renders LaTeX in comments, and which embeds arbitrary plot windows at given points in the code, but without any data persistence, and without the embedded plots actually being saved anywhere - your file is still a normal Python file and it's just the editor rendering things that way based on magic comments or something.

Or maybe I should just write a decorator that renders a function's docstring as LaTeX and embeds any matplotlib windows produced into one scrolling document with the sections named after the decorated functions. Decorator could take an argument telling it whether to include the full source of the function, the comments of which it could also render as LaTeX. Then you have input code compatible with your favourite text editor and version control, and an output document which optionally includes the code.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: