Hello + Installation question - Pip/UV vs Conda

Branck · January 10, 2025, 11:15pm

Hi Everyone,

I have been working as a quant/dev for the past 15 years.
I consider myself an experienced dev. I have rather extensive C# experience, but my Python experience is more limited. Outside of my home coding, I have always worked in places where other takes care of packaging/deployment.
I really want (and need) to grow my ML knowledge.

I have been trying to start going through the book and just discovered this community.

I am quite excited and ready to put the effort throughout the chapters but first things first, I need to get the code running.

My understanding of the installation process is that we need Conda for packages and Pip for others (or pip equivalent like UV?) .
I am trying to follow the instructions but it seems to imply I am creating 2 different environments.
I am running the code in VSCode. How do i reconcile those 2 environments together?

I got some of the early chapters running in a separate “copy” by using UV as a package manager and installing only packages that were required. However, I hit a roadblock when Zipline came into play.

Also, at the moment I am running everything locally. I have a more than decent laptop with 64GB but will I need more performance?
I have seen people mentionning running this in Google Colab but that would only add to the initial learning curve for me.

Thank you very much in advance for the help.

Kind regards,

Duncan57 · January 15, 2025, 5:27pm

Yes, you will definitely need PIP and it should have already installed with the Microsoft Python package in VS Code.

You can learn how to do local environments in VS Code, but personally, I still prefer Conda. There are 1000’s of libraries out there and not all are compatible with each other … I find Conda the easiest way to make sure I am installing compatible libraries. I probably have 20 different environments set up depending on the project. Once you set up a folder for a project, you can select the local kernal to run in it, and that’s where all the environmental stuff can be loaded, but add all your imports to the code first just to make sure. Happy coding.

Branck · January 16, 2025, 6:34pm

Thanks for the reply.

I know how to use pip and create venv. This is Conda I am not familiar with.
My “issue” with the book proposed set up is that it seems to want a bit of both and I have no idea how to get started.
For now, I am only using UV as a package manager (pip based in my understanding). This means I may not be able to get zipline installed but I’ll (try to) bridge that gap when I get there.
Really struggling with data access at the moment.
Best,

itsjustausername · January 17, 2025, 1:41am

I have been able to use both pip and conda in a conda enviroment. I mostly use conda enviroments. I have Anaconda installed in wsl debian, and I can just run code . and do everything thing in VSCode. I started from scratch about 6 months ago using this…that’s how I learned to get my python environment to talk to VSCode. I am yet to get Zipline to work. Conda comes with a cheatsheet that I have on my desktop for reference. What type of data are you looking for?

Branck · January 17, 2025, 9:57am

Hi @itsjustausername,

Thanks for the reply.
I’ll check the link if/when my non Conda eveironment is not enough.

Regarding data, i have lost quite some time with the data from Nasdqa Itch, same as this link,

But in general, I guess I was naively hoping the data part would be readily working and usable. But I am going through it at the moment before i dive deeper into the “models” chapters.

Best,

itsjustausername · January 17, 2025, 4:22pm

If you can get away without the ITCH data, you should be fine, although I was able to pull it last year. Have not tried this year. I trade futures and most of the book is geared towards stocks not derivatives. You system should be more than fine if you have a GPU with ~ 8GB RAM, would not hurt to have an external SSD for the data…the ITCH data for one market session is ~ 50GB.

The models are fun, once you get the hang of scoring the accuracy and make sense of how to use them for inference you should be ready to cook. This is worth investigating for data…here’s the repo.

To be fair, the book is 4 years old and a lot has changed like the CHRIS dataset from nasdaq going dark, the code may not hurt a run through a chat bot for auditing. A 3 month investment in this would not be fruitless. What would be great is a third edition.

According to this the book is meant be used as a toolkit. I use it as a map on a road-trip, not like a GPS.