Chapter 2 - Parse Itch order flow - 4.3.1 Issue reading xlsx with pandas - Upgraded to fix


Just starting through the book and as I was going through the jupyter notebook in chapter 2 I received an error when trying to run pd.read_excel('message_types.xlsx', sheet_name='messages',).

Notebook: machine-learning-for-trading/01_parse_itch_order_flow_messages.ipynb at master · stefan-jansen/machine-learning-for-trading · GitHub
section 4.3.1 Load Message Types

The first error was saying "No module xlrd as that version 0.22.0 of Pandas tried to use xlrd to parse .xlsx files. Well, I installed xlrd but it’s latest version no longer supports .xlsx.

There was a bug mention for this here with some tips to add the engine variable with value openpyxl:
BUG: Cannot read XLSX files with xlrd version 2.0.0 #38410 Closed.

What I did to get things to work.

I did the following:
1.) Upgraded pandas and installed openpyxl.
2.) added engine='openpyxl' to pandas.read_excel() like:
pd.read_excel('message_types.xlsx', sheet_name='messages', engine='openpyxl')

pip install pandas --upgrade

pip install openpyxl --upgrade

This worked for me. Not sure if that was the right thing to do . Proceeding forward but thought to share in case anyone else was just about to start and ran into this.

[EDIT] : Using the docker image. Noticed I was in the backtest environment. Not sure if that may have been why and If I needed to be in ml4t.

  • Adrian
1 Like

Hi @adrian, apologies for the long delay. Newer pandas versions do indeed require openpyx, see pandas release notes.

In case you haven’t noticed, there are now conda environments so you don’t need to use Docker anymore (see installation instructions).

Cheers, Stefan