feat: use direct type casting instead of memcpy #688

miaoerduo · 2024-01-09T05:08:55Z

Hi, I'm trying to read the llama.cpp code, it's a very interesting and fantasy project.

Here is a small optimization: for the small object copy (int64_t, float, int16_t, etc), we just need to cast it, no need to make a func call memcpy.

Here is a simple example.

#include <string.h>
#include <vector>
#include <iostream>
#include <chrono>

#define ggml_memcpy_opt(dst, src, type) do { *(type *)(dst) = *(type *)(src); } while (0)

int main() {
    // cur time in ns

    std::vector<size_t> time_cost(3, 0);

    const size_t length = 10000;
    const size_t times = 10000;
    std::vector<float> src(length, 100.0);
    std::vector<int32_t> dst1(length, 100);
    std::vector<int32_t> dst2(length, 100);
    std::vector<int32_t> dst3(length, 100);

    for (int idx = 0; idx < times; ++ idx) {
        auto t1 = std::chrono::high_resolution_clock::now();
        memcpy(dst1.data(), src.data(), src.size() * sizeof(float));
        auto t2 = std::chrono::high_resolution_clock::now();
        for (size_t idx = 0; idx < src.size(); ++idx) {
            memcpy(&dst2[idx], &src[idx], sizeof(int32_t));
        }
        auto t3 = std::chrono::high_resolution_clock::now();
        for (size_t idx = 0; idx < src.size(); ++idx) {
            ggml_memcpy_opt(&dst3[idx], &src[idx], int32_t);
        }
        auto t4 = std::chrono::high_resolution_clock::now();

        time_cost[0] += std::chrono::duration_cast<std::chrono::nanoseconds>(t2 - t1).count();
        time_cost[1] += std::chrono::duration_cast<std::chrono::nanoseconds>(t3 - t2).count();
        time_cost[2] += std::chrono::duration_cast<std::chrono::nanoseconds>(t4 - t3).count();
    }

    std::cout << "memcpy: " << time_cost[0] / times << " ns " << std::endl;
    std::cout << "for_loop memcpy: " << time_cost[1] / times << " ns " << std::endl;
    std::cout << "for_loop cast: " << time_cost[2] / times << " ns " << std::endl;
}

when compilee with -O2

the time cost should be on my PC:

memcpy: 1169 ns 
for_loop memcpy: 6541 ns 
for_loop cast: 3785 ns

when compiled with -O3 (I think the compiler has done some tricks):

memcpy: 1158 ns 
for_loop memcpy: 1224 ns 
for_loop cast: 1194 ns

theoretically for small object copy the direct casting should be faster than memcpy.

Hope I could contribute to this great project!

ggerganov

Thanks - nice contribution. The reason for these memcpy is that I never have to worry about pointer alignment and type punning errors, so I've adopted the practice to always memcpy and don't worry about any potential problems (although, these are C++ related concerns and don't apply to C anyway).

Looking at the PR, I suppose these are all safe changes, so it seems like a good change. Even more so that it removes a header.

slaren · 2024-01-09T09:29:33Z

The compiler is smart enough to replace the memcpys if it can prove that it is safe, but even if it wasn't, the memcpy are there because they are necessary to respect the strict aliasing rules.

The benchmark is not correct. Testing very small code is tricky, but we can always check the assembly:
https://godbolt.org/z/vaK8549G1

miaoerduo · 2024-01-09T10:05:28Z

The compiler is smart enough to replace the memcpys if it can prove that it is safe, but even if it wasn't, the memcpy are there because they are necessary to respect the strict aliasing rules.

The benchmark is not correct. Testing very small code is tricky, but we can always check the assembly: https://godbolt.org/z/vaK8549G1

oh yeah, it's right! Thanks for the review. I haven't used this tool to see the asm before. Let me close this PR.

feat: use direct type casting instead of memcpy

b57e77c

ggerganov approved these changes Jan 9, 2024

View reviewed changes

ggerganov requested a review from slaren January 9, 2024 09:01

miaoerduo marked this pull request as draft January 9, 2024 10:07

miaoerduo closed this Jan 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: use direct type casting instead of memcpy #688

feat: use direct type casting instead of memcpy #688

miaoerduo commented Jan 9, 2024

ggerganov left a comment

slaren commented Jan 9, 2024

miaoerduo commented Jan 9, 2024

feat: use direct type casting instead of memcpy #688

feat: use direct type casting instead of memcpy #688

Conversation

miaoerduo commented Jan 9, 2024

ggerganov left a comment

Choose a reason for hiding this comment

slaren commented Jan 9, 2024

miaoerduo commented Jan 9, 2024