Steps
- read stock close price from data source
- generate risk and return data
- do the analysis to find out the tangent portfolio
- excel read csv file input and your own position for portfolio management
- pip install virtualenv
- py -m venv myenv #venv for python3 , env folder name: myenv
- cd myenv/Scripts
- .\pip.exe install -r ....\requirements.txt #using pip in myenv/Scripts , here is window example
- .\pip.exe list
- configure intellij to myenv refer to https://www.jetbrains.com/help/pycharm/creating-virtual-environment.html#python_create_virtual_env
.\myenv\Scripts\activate.bat #set Env to virutalenv Path
python .\scripts\GenPortfolio3.py
- Python’s dynamic nature makes it slower than compiled languages. This issue is exacerbated in scientific computing because we run simple operations millions of times.
- Don’t loop over your data. Instead, vectorize your operations (remember Hector Vector).
- You can use numpy directly by calling .to_numpy() on the Dataframe, which can be even faster.
- Choose the smallest possible numerical dtypes, use categoricals, and prefer float32 over the newish nullable dtypes (for now).
- Use parquet to store your data. Use snappy compression (or none at all) during development, resorting to gzip for long-term archival.
In pandas DataFrames, the dtype is a critical attribute that specifies the data type for each column. Therefore, selecting the appropriate dtype for each column in a DataFrame is key.
Because pandas stores strings as objects, it has to fall back on slow Python arrays.
In contrast to numpy, a Python list
has a pointer to a memory-contiguous buffer
of pointers, which point to objects stored in memory, but which also reference data stored in other locations.
Regarding ints and floats, downcasting is the key to saving memory. Pandas supports 8, 16, 32, and 64-bit signed and unsigned integers and 16, 32, and 64-bit floats. By default, it opts to use 64-bit variants for both types.
Downcasting numeric types into c primitive types.
n = 100_000
df = pd.DataFrame({
"uint8": np.random.randint(10, 20, n),
"uint32": np.random.randint(100_000, 200_000, n),
"int16": np.random.randint(1_000, 2_000, n) * np.random.choice((-1, 1), n),
"float32": np.random.uniform(100_000, 200_000, n),
}
)
df_downcasted = (
df
.apply(pd.to_numeric, downcast="float")
.apply(pd.to_numeric, downcast="integer")
.apply(pd.to_numeric, downcast="unsigned")
)
https://tryolabs.com/blog/2023/02/08/top-5-tips-to-make-your-pandas-code-absurdly-fast