Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Success stories using Scalene's optimization suggestions #554

Open
emeryberger opened this issue Jan 23, 2023 · 4 comments
Open

Success stories using Scalene's optimization suggestions #554

emeryberger opened this issue Jan 23, 2023 · 4 comments

Comments

@emeryberger
Copy link
Member

This thread is meant to document and collect successful optimizations suggested by Scalene (vs. the more general success stories given here: #58).

@emeryberger
Copy link
Member Author

emeryberger commented Jan 23, 2023

Speedup of 2x, memory consumption dropped by 1/2

Issue #31 includes the following code, which we turn into a function:

def main1():
    x = np.array(range(10**7))
    y = np.array(np.random.uniform(0, 100, size=10**8))

Scalene (2c18f59) suggests the following optimization for this function:

# Proposed optimization:
# Vectorized operations are used to reduce the time and memory complexity of the code.
def main1():
    # After optimization
    x = np.arange(10**7) # np.arange is faster than np.array
    y = np.random.uniform(0, 100, size=10**8) # No need to wrap in np.array

This optimization speeds the code by almost 2x, and halves memory consumption (from 1.5GB to 773MB).

@emeryberger
Copy link
Member Author

emeryberger commented Jan 23, 2023

90x speedup

#58 (comment) presents the following code:

for i in range(n_features):
    for n in range(n_samples):
        subgrad[i] += (- y[n] * X[n][i]) if y[n] * (np.dot(X[n], w) + b) < 1 else 0
    subgrad[i] += self.lambda1 * (-1 if w[i] < 0 else 1) + 2 * self.lambda2 * w[i]

Scalene proposes the following optimization:

# Vectorized operations to replace for loops                                                                                                    
subgrad[:-1] = np.sum(-y[:, None] * X * (y * (X.dot(w) + b) < 1)[:, None], axis=0)
subgrad[:-1] += self.lambda1 * np.sign(w) + 2 * self.lambda2 * w
subgrad[-1] = np.sum(-y * (y * (X.dot(w) + b) < 1))

Scalene's proposed optimization accelerates the original code by at least 90x (89 seconds to 1 second, when running 500 iterations), and takes full advantage of multiple cores.

Original code
# original code
    def subgradient(self, wb, X, y):
        """Compute the subgradient of the objective function.                                                                                           
                                                                                                                                                        
        Arguments:                                                                                                                                      
            wb (ndarray, shape = (n_features+1,)):                                                                                                      
                concatenation of the weight vector with the bias wb=[w,b]                                                                               
            X (ndarray, shape = (n_samples, n_features)):                                                                                               
                Training input matrix where each row is a feature vector.                                                                               
                The data in X are passed in without a bias column!                                                                                      
            y (ndarray, shape = (n_samples,)):                                                                                                          
                Training target. Each entry is either -1 or 1.                                                                                          
                                                                                                                                                        
        Returns:                                                                                                                                        
            subgrad (ndarray, shape = (n_features+1,)):                                                                                                 
                subgradient of the objective function with respect to                                                                                   
                the coefficients wb=[w,b] of the linear model                                                                                           
        """
        n_samples = X.shape[0]
        n_features = X.shape[1]

        w = wb[:-1]
        b = wb[-1]

        subgrad = np.zeros(n_features + 1)
        for i in range(n_features):
            for n in range(n_samples):
                subgrad[i] += (- y[n] * X[n][i]) if y[n] * (np.dot(X[n], w) + b) < 1 else 0
            subgrad[i] += self.lambda1 * (-1 if w[i] < 0 else 1) + 2 * self.lambda2 * w[i]

        for n in range(n_samples):
            subgrad[-1] += - y[n] if y[n] * (np.dot(X[n], w) + b) < 1 else 0

        return subgrad
Optimized code
# Proposed optimization:                                                                                                                                
# This code has been optimized by replacing the for loops with vectorized operations. This reduces the time complexity from O(n^2) to O(n), resulting in a substantial speedup.                                                                                                                                
    def subgradient(self, wb, X, y):
        """Compute the subgradient of the objective function.                                                                                           
        Arguments:                                                                                                                                      
            wb (ndarray, shape = (n_features+1,)):                                                                                                      
                concatenation of the weight vector with the bias wb=[w,b]                                                                               
            X (ndarray, shape = (n_samples, n_features)):                                                                                               
                Training input matrix where each row is a feature vector.                                                                               
                The data in X are passed in without a bias column!                                                                                      
            y (ndarray, shape = (n_samples,)):                                                                                                          
                Training target. Each entry is either -1 or 1.                                                                                          
        Returns:                                                                                                                                        
            subgrad (ndarray, shape = (n_features+1,)):                                                                                                 
                subgradient of the objective function with respect to                                                                                   
                the coefficients wb=[w,b] of the linear model                                                                                           
        """
        n_samples = X.shape[0]
        n_features = X.shape[1]
        w = wb[:-1]
        b = wb[-1]
        # Vectorized operations to replace for loops                                                                                                    
        subgrad = np.zeros(n_features + 1)
        subgrad[:-1] = np.sum(-y[:, None] * X * (y * (X.dot(w) + b) < 1)[:, None], axis=0)
        subgrad[:-1] += self.lambda1 * np.sign(w) + 2 * self.lambda2 * w
        subgrad[-1] = np.sum(-y * (y * (X.dot(w) + b) < 1))
        return subgrad

@emeryberger
Copy link
Member Author

emeryberger commented Jan 24, 2023

18x speedup over original; 1.8x speedup over manual optimization

#58 (comment) presents the following code (with some edits to put the bottleneck code into a function do_it():

column_names_example = [i for i in range(10000)]
index = pd.MultiIndex.from_tuples([("left", c) for c in column_names_example] + [("right", c) for c in column_names_example])
df = pd.DataFrame(np.random.rand(1000, 20000), columns=index)

def keep_column(left_col, right_col):
  return left_col[left_col.first_valid_index()] > right_col[right_col.last_valid_index()]

def do_it():
  v = [c for c in column_names_example if keep_column(df["left"][c], df["right"][c])]

That code takes about 6.54 seconds on a Mac Powerbook M1.

After about a dozen attempts (primarily including ones that did not run), Scalene produced a solid optimization:

# Proposed optimization: Replaced list comprehension with vectorized operation. 
def do_it():
    v = df["left"].loc[df["left"].first_valid_index()] > df["right"].loc[df["right"].last_valid_index()]
    return v.index[v].tolist()

That code runs in 0.37 seconds, corresponding to an almost 18x increase in performance. The author of the comment had previously produced the following optimization:

def do_it():
    df_l = df["left"]
    df_r = df["right"]
    return [c for c in column_names_example if keep_column(df_l[c], df_r[c])]

That code takes 0.67 seconds, making Scalene's optimization roughly 1.8x faster.

@ianozsvald
Copy link

@emeryberger you may be happy to hear that I've added Scalene to the 3rd edition of our High Performance Python book in the Profiling chapter, it is due to publication in 2025. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants