It is a basic problem in the first jupyter notebook file for you. However, I struggled for two days.
I tried to run …\02_market_and_fundamental_data\01_NASDAQ_TotalView-ITCH_Order_Book\01_parse_itch_order_flow_messages.ipynb, and it returns errors.
In cell #31 (I make the error in bold style):
Start of Messages
03:02:31.65 0
Start of System Hours
04:00:00.00 241,258
Start of Market Hours
09:30:00.00 9,559,279
09:44:09.23 25,000,000 00:00:52.34 Cannot serialize the column [primary_market_maker] because its data contents are not [string] but [integer] object dtype L
<class ‘pandas.core.frame.DataFrame’>
RangeIndex: 214749 entries, 0 to 214748
Data columns (total 7 columns):
Column Non-Null Count Dtype
I haven’t yet solved the issue, but just putting it out there that I’m also experiencing the same error message when running through the notebook.
"KeyError: 'No object named P in the file'"
The output from the script that leverages the store_messages() function does show that it’s parsing P messages however:
Start of Messages
03:02:31.65 0
Start of System Hours
04:00:00.00 241,258
Start of Market Hours
09:30:00.00 9,559,279
09:44:09.23 25,000,000 00:00:43.43
S
R
H
Y
L
Cannot serialize the column [primary_market_maker]
because its data contents are not [string] but [integer] object dtype
L
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 214749 entries, 0 to 214748
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 stock_locate 214749 non-null int64
1 tracking_number 214749 non-null int64
2 timestamp 214749 non-null timedelta64[ns]
3 mpid 214749 non-null object
4 primary_market_maker 214749 non-null object
5 market_maker_mode 214749 non-null object
6 market_participant_state 214749 non-null object
dtypes: int64(2), object(4), timedelta64[ns](1)
memory usage: 11.5+ MB
None
S 1
U 1
Q 1
C 1
I 1
V 1
P 1
E 1
X 1
R 1
F 1
D 1
A 1
L 1
Y 1
H 1
J 1
Name: count, dtype: int64
J 1
V 1
S 3
H 8885
Q 8887
R 8887
Y 8926
C 9176
P 108412
L 214749
E 364951
F 836655
I 1072326
X 1086393
U 2132765
D 9044692
A 10094291
dtype: int64
Duration: 00:00:45.49
For whatever reason it’s not being appended to the hd5 file - perhaps a consequence of this error in the output:
Cannot serialize the column [primary_market_maker]
because its data contents are not [string] but [integer] object dtype
Hi, I also have the same problem but different case under anaconda (Python 3):
Cannot serialize the column [primary_market_maker]
because its data contents are not [string] but [integer] object dtype
L
<class 'pandas.core.frame.DataFrame'>
Here is the full output
Start of Messages
03:02:31.65 25,000,000
Start of System Hours
04:00:00.00 25,241,258
Start of Market Hours
09:30:00.00 34,559,279
09:44:09.23 50,000,000 00:01:13.99
Cannot serialize the column [primary_market_maker]
because its data contents are not [string] but [integer] object dtype
L
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 429498 entries, 0 to 429497
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 stock_locate 429498 non-null int64
1 tracking_number 429498 non-null int64
2 timestamp 429498 non-null timedelta64[ns]
3 mpid 429498 non-null object
4 primary_market_maker 429498 non-null object
5 market_maker_mode 429498 non-null object
6 market_participant_state 429498 non-null object
dtypes: int64(2), object(4), timedelta64[ns](1)
memory usage: 22.9+ MB
None
S 1
U 1
Q 1
C 1
I 1
V 1
P 1
E 1
X 1
R 1
F 1
D 1
A 1
L 1
Y 1
H 1
J 1
Name: count, dtype: int64
J 2
V 2
S 6
H 17770
Q 17774
R 17774
Y 17852
C 18352
P 216824
L 429498
E 729902
F 1673310
I 2144652
X 2172786
U 4265530
D 18089384
A 20188582
dtype: int64
Duration: 00:01:20.92
Because pandas could not infer encoded column’s type correctly, so we must convert explicitly
Update format_alpha() like this
def format_alpha(mtype, data):
"""Process byte strings of type alpha"""
for col in alpha_formats.get(mtype).keys():
if mtype != 'R' and col == 'stock':
data = data.drop(col, axis=1)
continue
data.loc[:, col] = data.loc[:, col].str.decode("utf-8").str.strip()
if encoding.get(col):
data.loc[:, col] = data.loc[:, col].map(encoding.get(col))
data[col] = data[col].astype(int) # convert to int
return data
try:
if 'primary_market_maker' in data.columns:
data['primary_market_maker'] = data['primary_market_maker'].astype('category')
if 'buy_sell_indicator' in data.columns:
data['buy_sell_indicator'] = data['buy_sell_indicator'].astype('category')
store.append(mtype,
data,
format='t',
min_itemsize=s,
data_columns=dc)
try:
for col_name in data.columns:
if pd.api.types.is_object_dtype(data[col_name].dtype):
try:
data.get(col_name).str
except AttributeError:
data[col_name] = pd.Series(data[col_name], dtype=np.int8)
Also, after using @nghnam’s contribution, I got another error in the last code-lines related to “Top Equities by Traded Value” section:
AttributeError: 'DataFrame' object has no attribute 'append'
It is highly likely that the error is due to using a newer pandas version in which the method append has changed to _append to not be mistaken with append method from list. So, in short, try _append instead of append