Optimal Portfolio

Steps

read stock close price from data source
generate risk and return data
do the analysis to find out the tangent portfolio
excel read csv file input and your own position for portfolio management

virtual environment setup

pip install virtualenv
py -m venv myenv #venv for python3 , env folder name: myenv
cd myenv/Scripts
.\pip.exe install -r ....\requirements.txt #using pip in myenv/Scripts , here is window example
.\pip.exe list
configure intellij to myenv refer to https://www.jetbrains.com/help/pycharm/creating-virtual-environment.html#python_create_virtual_env

Virutalenv Activate

.\myenv\Scripts\activate.bat  #set Env to virutalenv Path

Run

python .\scripts\GenPortfolio3.py

Tips for Panda performance improvement

Python’s dynamic nature makes it slower than compiled languages. This issue is exacerbated in scientific computing because we run simple operations millions of times.
Don’t loop over your data. Instead, vectorize your operations (remember Hector Vector).
You can use numpy directly by calling .to_numpy() on the Dataframe, which can be even faster.
Choose the smallest possible numerical dtypes, use categoricals, and prefer float32 over the newish nullable dtypes (for now).
Use parquet to store your data. Use snappy compression (or none at all) during development, resorting to gzip for long-term archival.

dtype in DataFrames

In pandas DataFrames, the dtype is a critical attribute that specifies the data type for each column. Therefore, selecting the appropriate dtype for each column in a DataFrame is key.

The object type

Because pandas stores strings as objects, it has to fall back on slow Python arrays. In contrast to numpy, a Python list has a pointer to a memory-contiguous buffer of pointers, which point to objects stored in memory, but which also reference data stored in other locations.

Numeric

Regarding ints and floats, downcasting is the key to saving memory. Pandas supports 8, 16, 32, and 64-bit signed and unsigned integers and 16, 32, and 64-bit floats. By default, it opts to use 64-bit variants for both types.

Downcasting numeric types into c primitive types.

n = 100_000

df = pd.DataFrame({
        "uint8": np.random.randint(10, 20, n),
        "uint32": np.random.randint(100_000, 200_000, n),
        "int16": np.random.randint(1_000, 2_000, n) * np.random.choice((-1, 1), n),
        "float32": np.random.uniform(100_000, 200_000, n),
    }
)

df_downcasted = (
    df
    .apply(pd.to_numeric, downcast="float")
    .apply(pd.to_numeric, downcast="integer")
    .apply(pd.to_numeric, downcast="unsigned")
)

Reference

https://tryolabs.com/blog/2023/02/08/top-5-tips-to-make-your-pandas-code-absurdly-fast

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.github/workflows		.github/workflows
doc		doc
exercise		exercise
lib		lib
scripts		scripts
tests		tests
ui		ui
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Optimal Portfolio

virtual environment setup

Virutalenv Activate

Run

Tips for Panda performance improvement

dtype in DataFrames

The object type

Numeric

Reference

About

Releases 2

Packages

Contributors 2

Languages

wskraymond/OptimalPortfolio

Folders and files

Latest commit

History

Repository files navigation

Optimal Portfolio

virtual environment setup

Virutalenv Activate

Run

Tips for Panda performance improvement

dtype in DataFrames

The object type

Numeric

Reference

About

Resources

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Languages

Packages