Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

msc placeholder: 5G overlay infrastructure for decentralised learning ??? #7258

Open
synctext opened this issue Jan 13, 2023 · 43 comments
Open
Assignees

Comments

@synctext
Copy link
Member

synctext commented Jan 13, 2023

Thesis defense target: 21 June 2024. Survey target: end of July 2023.
Would like to have a fresh master thesis topic, not incremental improvement of other thesis work.
Starting roughly Q1 2023 or summer of 2023, flexible. update: starting lit. survey 2nd May
update 2: literature survey finished: 3 oct 2023.

RTOS expertise. AWS. Dream of contributing to The Linux Kernel. Byte-level stuff OK, even assembly person in the age of Javascript :-) Like to use machine learning, but not invent new ML stuff or central focus of thesis (no unsupervised learning, no online learning). Thus more ML that is: adversarial, byzantine, decentralised, personalised, local-first AI, edge-devices only, low-power hardware accelerated. Prefer to utilise advanced algorithms msc course knowledge.

Possible brainstorm starting idea: start building the fastest machine learning based on hardware acceleration. First step is get the hardware running fast, stepwise modify algorithms and tweak towards machine learning for learn-to-rank, learn-through-consumption, or even learn-about-trust (reputation graph, work graph, MeritRank inspired etc). Promised phones to test.

  • Applied ML direction {less interested}. Related work to astronomical hardware cost for AI. OpenAI has spend $63M on hardware at least:
In February 2018, the Organization entered into a two year services agreement with Google, LLC for cloud computing
services. The terms include a minimum spend commitment of $63M during the service period which the Organization
had fully satisfied as of the date of financial statement issuance

https://rct.doj.ca.gov/Verification/Web/Download.aspx?saveas=560291.pdf&document_id=09027b8f803a8976 [source]

@synctext
Copy link
Member Author

synctext commented Mar 23, 2023

Concrete idea for NAT survey

This survey describes the progress in the field of an Internet which is fully connected, currently mobile devices are not fully participating within the network. Smartphones are unable to receive message from others. Only Facebook, Google, and other servers in the cloud are able to communicate with billions of smartphone users. In the name of security billions of users have a constrained network, without freedom to communicate.

Find on scholar

Year scientific article or report
2000 SIP, NAT, and Firewalls - master thesis KTH
2003 Network convergence and the NAT/Firewall problems
2005 Characterization and measurement of tcp traversal through nats and firewalls
2006 Implementation and performance study of a new NAT/firewall signaling protocol
2008 A Better Approach than Carrier-Grade-NAT
2008 Free-riding, fairness, and firewalls in p2p file-sharing
2009 A measurement of NAT and firewall characteristics in peer-to-peer systems
2011 Delft work UDP NAT and Firewall Puncturing in the Wild
2011 Tribler: P2p media search and sharing
2013 Assessing the impact of carrier-grade NAT on network applications
2013 Common requirements for carrier-grade NATs (CGNs)
2013 A Royal Opinion on Carrier Grade NATs
2013 BT Retail Tests IP Address Sharing
2014 On the performance and fairness of BitTorrent-like data swarming systems with NAT devices
2014 Deterministic Address Mapping to Reduce Logging in Carrier-Grade NAT Deployments
2016 Carrier-grade NAT—is it really secure for customers? A test on a Turkish service provider
2016 A multi-perspective analysis of carrier-grade NAT deployment
2016 Statistical network monitoring: Methodology and application to carrier-grade NAT
2016 Overudp: Tunneling transport layer protocols in udp for p2p application of ipv4
2018 Inferring carrier-grade NAT deployment in the wild
2018 IETF Internet Standard draft on Trustchain
2020 birthday paradox solution https://tailscale.com/blog/how-nat-traversal-works/
2020 https://github.com/danderson/nat-birthday-paradox
2021 A QUIC (K) Way Through Your Firewall?
2021 Hardware details, Fortigate: https://news.ycombinator.com/item?id=27489797
2022 How NAT traversal works — NAT notes for nerds
2023 Doomed to Repeat with IPv6? Characterization of NAT-centric Security in SOHO Routers

Taken from the master thesis of 2000:
image

ToDo1: 30 citations to carrier grade NAT, and all these topics.
ToDo2: taxonomy list, https://www.rfc-editor.org/rfc/rfc3234

Finally, we investigated various telecom providers in The Netherlands about their NAT and blocking practices. We procured 12 SIM cards and measured their behavior. See full connectivity matrix of Sim-to-Sim card. Only 3 offer free Internet... {ToDo}.

TODO: register at https://mare.ewi.tudelft.nl/project 📝

@OrestisKan
Copy link

@synctext registration is for the thesis not for the literature survey no ?

@synctext
Copy link
Member Author

indeed, it's nice if you register your thesis as early as possible.

@OrestisKan
Copy link

Literature_Survey.pdf

@synctext
Copy link
Member Author

synctext commented May 31, 2023

Feel free to add a bit more content on reproducing state-of-the-art literature.

scientific problem of universal connectivity is not explained clearly. Storyline goes too fast, page 2 already has "port-restricted cone NAT". Take .5 page for a tutorial on the concept of an incoming connection. Need structure!

Section 5. Reproducing results from literature
After presenting the relevant 34 prior works of covered in this survey we now combine the state-of-the-art results. Using a practical experimental evaluation we reproduced the best-of-class algorithms presented in the discovered literature. We confirmed the findings of the body of literature within our reproduction experiment. Our simple app reproduces the NAT penetration algorithms of the main literature [2,5,17]. Cardinal outcome of our experimental CGNAT evaluation is the success rate, something often lacking in studies. The success rate for various Dutch telecom providers is determined to be: 97%.
ETC.

EDIT: brainstorm about master thesis focus. Idea for title: "5G overlay infrastructure for edge-based decentralised learning". Context to sell your perfect_overlay effort. Only need a few weeks doing a minimal-viable-product of decentralised machine learning. Simply take this gossip-based ML algorithm and running code. Goal: 100 actual nodes {mixed real ARM Android and x86 Kotlin}!

@OrestisKan
Copy link

OrestisKan commented May 31, 2023

  1. Polish text
  2. Create taxonomy table with the literature (from the literature survey from other student): https://arxiv.org/abs/2212.06436
  3. Create an app and test penetration-rate using ipv8-kotlin

@synctext
Copy link
Member Author

synctext commented Jun 19, 2023

MARE: "5G overlay infrastructure for decentralised learning"

Update:

@synctext synctext changed the title msc placeholder: brainstorm on search, real-time, or hardware msc placeholder: 5G overlay infrastructure for decentralised learning ??? Jun 19, 2023
@synctext
Copy link
Member Author

synctext commented Jul 10, 2023

Goal: mechanism for one phone to help another phone to puncture their carrier-grade NAT.

@OrestisKan
Copy link

Literature_Survey.pdf

@synctext
Copy link
Member Author

synctext commented Jul 17, 2023

Nearly done with the Lit Survey. [38] citations to forums and scientific papers. Great result to include:
puncure log
Just put as-simple-as-possible description of the SIM cards from 6 different 4G/5G providers.

Research Assistant job: send 50 UDP packets, count how many arrive. Repeat for all SIM-card combinations. Test the performance of EVA, note that you can then quickly run out of your 100-ish MByte SIM data quota. Read from Rahim on the binary transport protocol called EVA. See some example code: https://github.com/KoningR/eurotoken/blob/5c84348ba16dd9ce4b97e53ff52a5cefe9ee97c1/src/main/kotlin/evatest/EvaApplication.kt

@OrestisKan
Copy link

OrestisKan commented Aug 18, 2023

Literature_Survey (2).pdf

  • Add the pictures with the explanations my dad said
  • Table with red green on which carriers failed and succeeded and why Lyca failed
  • Picture of the sim cards and the phones etc
  • Also the picture above (17th july)

Lyca is symmetric NAT, the rest (Lebara, TMOBILE and vodaphone) could cross communicate while they all failed with Lyca ( even Lyca to Lyca communication failed). Theoretically with Birthday paradox Lyca to Lyca communication may be achieved. We need to determine the address and port predictability in order to understand how long it would take for the NAT to be penetrated and how long it would take for Lyca to block the requests

Willingess to travel (and I have accommodation maybe?)

  • Luxembourg
  • France
  • Italy
  • Germany
  • Croatia
  • UK
  • Ireland
  • Greece
  • Cyprus
  • Portugal
  • Belgium
  • Austria
  • Romania
  • North Macedona
  • Bosnia Herzegobina
  • Serbia
  • New Jersey
  • Vancouver Canada
  • Australia
  • Egypt
  • Chile

Reason for traveling: Live physical testing 4g and G5 communications and procurement of SIM cards

Research assistantship ending 30/09/23!

@synctext
Copy link
Member Author

synctext commented Aug 18, 2023

  • €50 for Cyprus based 3G/4G SIM cards for local experimental study
  • The goal of the app was to determine whether IPv8 could make two phones each on a different carrier’s 5G running kotlin-ipv8 and a computer running on WiFi the JVM version of IPv8 discover each other and communicate by penetrating potential NATs that are in the way. Please replace with a more scientific wording, leave out all engineering. Suggestion: With real-world 4G/5G SIM-cards we experimentally determined the efficacy of the NAT puncturing methods described in the literature. For this purpose we significantly re-factored a networking library called "IPv8" developed in 2020 at Delft University of Technology. The IPv8 library uses Kotlin to implement the UDP NAT puncturing approach documented in an expired IETF Internet Standard draft from 2018[REF].
  • Action for adviser: try to allocate €5000 (???) in hardware and travel expenses for on-site EU-wide networking experiments within master thesis or Research Assistant in Tribler Lab position. Back of envelope calculation, 50 SIM card x €20 = €1000. Remaining travel budget then €4000. Android Machine Learning is working. With @quintene porting of TensorFlow Light on Android, we could together test "Decentralised AI beyond federated learning on 5G".
  • IPv6 is happening after 30 years, needs testing. https://news.ycombinator.com/item?id=32798003
  • Discussing post-Delft degree options: responsible AI for 5G grant funding https://www.ngi.eu/opencalls/#ngizeroreview
  • Has Amazon Big Tech experience
  • Ambitious master thesis {or goal of entire lab this year}, greatly expanding the scope (builds on your assembly & Big Tech experience)
    1. flawless EU-wide NAT puncturing
    2. Effortless UDP-based binary transfer between any two phones
    3. Cryptographic key pair for any unique smartphone for privacy-respecting self-sovereign identity (IPv8 ID)
    4. Cryptographic certificates of rendezvous between any two phones
    5. Minimal viable realisation of ranking function and MeritRank, see https://arxiv.org/pdf/2308.07148.pdf
    6. Gossip exchange with bias to peers ranked as trustworthy
    7. Decentralised AI. integrate gossip exchange with @quintene for decentralised machine learning (BeyondFederated - truly decentralised learning at the edge #7254).

@OrestisKan
Copy link

Final Literature Survey with the suggested improvements
Literature_Survey.pdf

@synctext
Copy link
Member Author

synctext commented Sep 11, 2023

Comments on this latest survey:

  • Title could be more informative with survey and carrier-grade NAT mentioning: "Survey and experiments on carrier-grade NATs.
  • "Internet connectivity nowadays has become a fundamental necessity.", superior flow of opening line by dropping "nowadays".
  • "While CGNAT has been successful in conserving IPv4 addresses", more a miximizing usage technique.
  • "NAT Punctuting", "stealling", typo stuff. No spell checking ? 🤕
  • "TABLE I: Overview of all peer-to-peer techniques to establish communication behind NATs" order by year?
  • Figure 3: " Two machines behind Firewalls using a synchronizer", please expand fully to 1 or 2 columns (+ fix "send sdummy")
  • "the maximum packets per second that a machine can send is ≈ 56000,", confusing, too little info, birthday paradox?
  • "C. Peers where both are behind an EDM-based NAT", define the cardinal performance parameter of NAT Hole Opening Time. If 10k packets get out per second and any UDP hole created by any single packet is valid for 30s, we obtain a birthday paradox match with 300k UDP packets! If both sides have similar parameters we get 300k x 300k pairings and you only need 1 match. Right?? 🔢
  • "behind an EDM-based NAT to achieve an almost instant collision with 99.9% probability.", puncture
  • "TABLE II: Carriers that succeeded in transmitting a package to another carrier" 👍 🎊 👍
  • "FIG. 9: The sims used for testing" great 12-SIM transport box with X-Ray shielding. Please maximum size and also Figure 8 {make full page width} readability. Move positioning much earlier, not references, but before even presenting results your show the setup.
  • "11 years after the launch of IPv6", check your dates. RFC 1883, Proposed Standard
  • Add IPv8 outdated IETF draft Internet Standard, https://datatracker.ietf.org/doc/html/draft-pouwelse-trustchain-01#section-4
  • Survey needs full bib info: "I. Livadariu, K. Benson, A. Elmokashfi, A. Dhamdhere, and A. Dainotti, “Inferring carrier-grade nat deployment in the wild,” 2018."
  • mention usage of fixed ICMP echo request packets. They can test your connection? mention reply trick: https://samy.pl/pwnat/

@OrestisKan
Copy link

Literature_Survey (1).pdf
Latest (hopefully final) version with all the suggestions for improvements that you requested

@OrestisKan
Copy link

OrestisKan commented Sep 20, 2023

@synctext birthday attack between phone running on Vodaphone5g and emulator running in eduroam wifi worked and they managed to connect, still needs optimizations cause its heavy etc but at least we know it works! More details in my Slack message

whats left to do:

  • Make it even more lightweight
  • Slowdown the request sending so it doesnt get flagged as an attack potentially and make it send reauests in slightly random intervals?
  • Dynamic "reset" simulations by figuring out the Nat model
  • Different behaviour is easy to hard for speedup since not always a birthday attack is needed hence one should figure out their nat type and if easy then try fixed ports and then enumeration of ports. If one is hard then birthday attack
  • Gather network diagnostics?
  • unit tests
  • cleanup code

@OrestisKan
Copy link

OrestisKan commented Oct 3, 2023

@synctext
Copy link
Member Author

synctext commented Oct 3, 2023

Solid progress! Survey completed, now ready for Arxiv submission.
Thesis brainstorm: link the TensorFlow Light which Quinten van Es got operational to birthday attack. get healthy IPv8 overlay. focus on binary transfer for "decentralised Artificial Intelligence". Fix the "information diffusion problem". measure UDP bandwidth throughput. EVA protocol also: this whole issue warning bad code 😷 Determine bottleneck. Improve. Write thesis DONE!!!

Improve activity grid principle of status of each of the 25 connected IPv8 peers.
image

Related IPFS work: https://github.com/plprobelab/network-measurements/blob/master/results/rfm15-nat-hole-punching.md
The measurement was designed to provide insights into when and why the DCUtR protocol fails in NAT hole punching and to provide recommendations for improvement. In total, we tracked 6.25M hole punches from 212 clients (API keys). The clients were deployed in 39 different countries and hole punched remote peers in 167 different countries. Our top findings were that: libp2p’s hole punching success rate is around 70%.
https://research.protocol.ai/publications/decentralized-hole-punching/

@OrestisKan
Copy link

OrestisKan commented Oct 17, 2023

THESIS TITLE (draft): First 5G deployment of Distributed Artificial Intelligence

IEEE_Conference_Template.pdf

Measure: UDP bandwidth, bottlenecks, timeouts on Android client and NATs, connection reset time and port association time, all possible conditions that make successful communication possible and complete understanding of all possible factors that cause a communication failure. Determine if there is an upper bound to the number of concurrent IPs that a device can talk to(e.g. 63 works and adding a 64th may break the least recently used).

Reliable data transfer: compare UDP and EVA protocol in terms of effective throughput, packet loss, congestion

Measure the exact NAT behaviour!

Measure NAT hole opening time!

I have operational 10 or 12 sim cards. I have two phones, hence I can use 2 sim cards at the time

@synctext
Copy link
Member Author

synctext commented Oct 17, 2023

update "This is brute forcing the public IP"{+port}, nice and sharp description somebody from Canada gave your work.

@OrestisKan
Copy link

OrestisKan commented Nov 8, 2023

SURVEY to be announced by Arxiv tomorrow
I added tests for:

  • Udp bandwidth measurement
  • NAT reset time

TODO:

  • Integrate Birthday attack & measurements into IPv8
  • Add the rest of the measurement tests mentioned on the comment above
  • Understand IPv8 codebase
  • Create a documentation explaining some code overview of IPv8 as a guide through the codebase. This : https://github.com/Tribler/kotlin-ipv8/tree/master/doc only contains tutorials on how to use the API but no explanations on the engine
  • Publish my codebase

Goal by Christmas:

  1. Quantify all measurements for the simcards
  2. Integrate the Birthday Attack in Ipv8

@OrestisKan
Copy link

OrestisKan commented Nov 9, 2023

Lit Survey is published: https://arxiv.org/abs/2311.04658

Edited to fix the broken reference link

@OrestisKan
Copy link

OrestisKan commented Nov 16, 2023

I HAVE CODE FOR:

Measuring:

  • UDP bandwidth

  • bottlenecks

  • Measure Roundtime of packets using "ping-pong"

  • timeouts on Android clients and NATs

  • connection reset time and port association time

  • all possible conditions that make successful communication possible

  • Complete understanding of all possible factors that cause a communication failure.

  • Determine if there is an upper bound to the number of concurrent IPs that a device can talk to(e.g. 63 works and adding a 64th may break the least recently used).

  • Reliable data transfer: compare UDP and EVA protocol in terms of effective throughput, packet loss, congestion

  • Measure the exact NAT behaviour!

  • Measure NAT hole opening time!

@synctext
Copy link
Member Author

synctext commented Nov 29, 2023

  • First graph for next meeting please. Otherwise it's behind schedule. X-axis: name of {5} provider {pair}, Y-axis the timeout in seconds. Measurement green dot for success. Measurement red cross for failure. (scatterplot like, add randomness to counter start).
  • focus on 2 master course, during X-mas
  • {repeating} getting a measurement infrastructure going (week of effort??). Get the first results graph!
  • two goals:
    1. measure and understand SIM-card behavior. Most simple infrastructure == central server. Or just two terminal windows into 2 phones with cmdline manual options --receiver --birthday-attack.
      • First 4G provider with same 4G provider SIM (no Lebara - Vodaphone mixing yet)
      • binary search for exact time-out is expensive. Cost of breaking: so slowly increasing counters.
      • No idea yet on the complexity of these carrier-grade NATs behaviour and internal state.
    2. Decentral IPv8 NAT-traversal. Fully distributed. No server needed.

@OrestisKan
Copy link

OrestisKan commented Dec 5, 2023

The Github repo of the research

View data gathering progress in this google sheet

@OrestisKan
Copy link

OrestisKan commented Dec 18, 2023

The first result are that the success rate of birthday attack is low and very dependent on the provided as can be seen [here]https://docs.google.com/spreadsheets/d/1hmGZ38y3Cngt8hsbJbR7SoZpRnAUu7uKivV9ODkhKSs/edit?usp=sharingl). I propose to gather data on the mapping of the NAT. A server listens to incoming packets from a phone and logs the return address:port, while the phone does the same (logging the address:port that it sent the packet from). The results can then be compared and we can reverse engineer the mapping function of each NAT. This can be used to reduce the collision space (now 65535^2). According to RFC 4787 the NAT mapping protocol has different behaviour on different ranges, hence identifying the "convenient" ranges for each carrier will allow us to reduce the collision space and increase the connectivity rate!

@synctext
Copy link
Member Author

synctext commented Dec 18, 2023

Idea of a "biased birthday attack" if you know the port-range, behaviour of used 4G/5G provider, or even the mapping function itself (trivial +1 counter).
Portugal and Greek SIM card also probably going to be procured.
Six years ago the superapp would show "NL KPN" for people you are connected to.

@OrestisKan
Copy link

OrestisKan commented Jan 17, 2024

Update On data gathering: Android app that will spam the server is ready. The server was very hard to do because of the 65k simultaneous processes and I managed to run 30k ports yesterday successfully going beyond 30k throws an exception since it runs out of memory, on a machine with 16GB RAM so I emailed Sandip yesterday if he could give me a 64GB server, still waiting for reply

My idea is to change the Birthday Attack based on the data gathered hoping to improve it. Then that repo will become a generic birthday attack public library for android connectivity that will be published in Gradle.

There's no plan to use IPv8 as dependency
Note: buying sim cards from random countries is useless without physically using them in the country with the local network
In the next 3 weeks plan:

  • Pass my Economics exam
  • Gather data from Lyca, Odido, Vodpahone, Lebara
  • Start processing the data and identifying useful patterns

@synctext
Copy link
Member Author

@Apple1D Indeed, strong authentication and identity management stuff is ignored by computer science for too long. Also no industry support, as it's not a golden money maker. Governments have decades of failures and many losses trying to craft this societal infrastructure. See our scientific analysis: https://arxiv.org/pdf/2401.05239.pdf

@Tribler Tribler deleted a comment from OrestisKan Jan 17, 2024
@synctext
Copy link
Member Author

synctext commented Jan 17, 2024

Please keep track of your planning. In Feb 2023 we need to do your master thesis progress moment

Date Milestone
20 Sep 2023 First ever successful Vodaphone 5G birthday attack 👏 🎉
Nov 2023 first code for UDP bandwidth, port association, ping time, etc
Dec 2023 understanding of NAT mapping
Jan 2024 generic library
Feb-May 2024 4G/5G measurement inside various EU countries
Feb-March 2024 integrate with Superapp + fix EVA binary transport
March 2024 finish writing Introduction + Problem description chapters
April 2024 integrate distributed machine learning: #7254
May 2024 Do experiments + finish writing experimental section thesis
1 June 2024 Thesis done
13 June 2024 Tentative Graduation Date 💥

@OrestisKan
Copy link

OrestisKan commented Feb 1, 2024

Updates on Lebara Research:

A single run looks like this: From any port goes to some specific "buckets" or ranges of ports, as shown
WhatsApp Image 2024-02-01 at 16 41 46

The problem is that these buckets are not consistent across runs, and they change based on timeouts of the port, the number of requests, which again is not consistent (after analysis)

What we know for sure:

  • There are around 6k ports that were never mapped to after 21 runs and a total of ~50 hours running
  • NAT chooses a random port, say X, and then any consecutive port is mapped to X+1, X+2+... (if X+i is available, if not, then X+i+1 and so on) until some unknown condition is met or no free port exists nearby. Then, NAT picks another random starter port and continues to increment linearly.
  • In each run, some buckets are often repeated (specific ranges that appeared even ~80 times in some runs)
  • Some ranges appeared as often as in 10/21 runs

Breakdown follows:

Ranges of ports that were never mapped

[(0, 1023), (19200, 19711), (40959, 41471), (49408, 49919), (60918, 65535)]
larger scatter plot (1)

Frequently Observed Ranges

The ranges and their percentage frequency:

[35328, 35583] -> 0.4762
[9216, 9471] -> 0.2857
[55040, 55295] -> 0.2857
[15616, 15871] -> 0.2381
[29440, 29653] -> 0.2381
[43520, 43775] -> 0.2381
[61184, 61439] -> 0.2381

I chose a 35% probability for all other ports and created a function.

def pick_next_port(): list_of_ranges, weights_list = build_ranges_lists_and_weights(ranges_with_occurrence_frequency, reachable_ports) chosen_range = random.choices(list_of_ranges, weights=weights_list)[0] return random.choice(chosen_range)
That gives the next port to send to (in this case, the input is Lebara specific, which will soon be based on your receiver's telecom provider).

This function takes 8.606910705566406e-05 seconds to run, so it is fast enough not to interfere with the app's speed (also, when ported to Kotlin, it will be faster).

I will continue to analyze and try to find more relations for now. Unfortunately, the "seed port" choice seems pretty random.

If anyone knows some data scientist/mathematician that can help, that would be great because this is getting out of my area of knowledge

I want also @synctext insights on any Machine learning /statistical approaches because atm it is outside of my realm and I only do random weighted choice on a range

Fallback Mechanism Proposal

Since now its established that NAT picks a random "seed port" and then increments linearly I want to test if the linear incrementation is affected by the IP of the receiver i.e. if each new receiver forces a new random seed port. If not we can utilize a STUN-like server to log the initial seed port and then the other peer will have a starting point.

@synctext
Copy link
Member Author

synctext commented Feb 2, 2024

Solid progress!
Please keep full focus on measuring several SIM cards. Afterwards we can exploit them. Easy tricks like one side doing 64k -1 port attempts and remote side doing +1 through SIM NAT? So 1 side uses a carrier-grade NAT with integral +1 algorithm. Our side wants to connect, start with the highest port number and counts down with -1. Without any failure or any timeout they are guaranteed to connect within 64k attempts, usually somewhere in the middle. Symmetric means that the +1 side needs to contain the correct UDP port for the -1 port, then it will "open up" for an incoming packet. Everything needs to match 😭
Working on rate control: 1k packet seems to be max that the server can handle before starting to drop. ToDo later: quantify exact server drop behaviour.
behavioural modelling chapter in thesis. Use Markov state-transition model to model the symmetric NAT behaviour please: Lica, T-Mobile (flaky 😮), Lebara, ??Cyprus??. Debug info for cell tower:

ADDED: 89 characters of base-11?! Mobile networking in rural Ethiopia! by Ben Kuhn. On youtube

@OrestisKan
Copy link

  • gather data from other providers + MTN
  • Analyze nd model the data and behaviour of each NAT
  • Write problem description and methodology

@synctext
Copy link
Member Author

synctext commented Feb 26, 2024

  • in Paris doing 4G measurements
  • Orange (no pre-paid, ID-check mandatory, 15min chat required in French), SFR france SIM cards for starting. 30 Euro.
  • Buy lebara again in France, because it could be different hardware or Internet settings. Goal is to get beyond 50 SIM-cards in 1 thesis photo, Figure-1.
  • Buy Bouygues, O2, Free Mobile, Nomad, Traveltomtom?
  • Experiment with e-SIM? You order an e-sim card for France on the internet, you receive a QR code, scan it, follow the simple steps and within less than 2 minutes you have a France e-sim card installed on your phone.
  • Please phase and plan your research. No generic tool yet. no QUIC or uTP binary transfer yet. But invest in a good measurement script to deeply understand the port behaviour ❗ How predictable and repeating is the port selection? Mapping behaviour is completely known? Port ranges? timeouts? MTU: measure it please! Jumbo frames???
  • Current measurement script
    • uses the server in Delft with lots of ports
    • gathering the mappings of the 4G NATs
    • timeout measurements between 2 phones. increasing time-out, waiting for packet drop. Fragile measurement methodology: timeout==end-of-measurement.
  • Sprint goal: Belgium and Norway Oslo next. process mapping. If time is left: get this uTP lib going between 2 servers (and phones) starting writing down mapping results.

@synctext
Copy link
Member Author

  • 2009 Natcracker: Nat combinations matter references birthday paradox, great work by KTH. Claims 27 unique NAT types exists?? 🤔
  • 2005 NATBLASTER: Establishing TCP connections between hosts behind NATs
  • 2024 NATexploder: decentralised federated learning with NAT puncturing 😅
    • ChatGTP 3.5 prompt: Cool existing NAT puncturing algorithms are natcracker and natblaster. please give me another cool name?
    • "NATgrenade", "NATmaverick" more AI generated names
    • "How about "NatForge"? It conveys the idea of crafting or forging a pathway through network address translation (NAT) barriers. It also implies strength and resilience, suggesting that the algorithm can efficiently navigate through NAT configurations."

@OrestisKan
Copy link

OrestisKan commented Mar 18, 2024

Updates 18/03

  • Timeout is server based, tests both time timeout between sends and timeout of response
  • Added MTU calculation and jumbo frames check
  • added function to see amount of simultaneous mappings the NAT can hold
  • Added feature to gather mappings without closing the ports since some data showed that when the socket closes the mapping of the nat disappears (to early to analyse results)
  • currently gathering Belgium data
  • End of the week will gather Norway data

Roaming update: There are simcards that when you level home nothing changes because you tunnel home (virtually nothing changes) Lyca NL, Lebara NL , MTN CY, and Lebara FR are tested to change the IP while roaming

Check if while roaming it behaves the same as the partner (open research question)

Server right now:

  • Listens to all sockets 65.5k sockets concurrently.

  • Supports timeout tests, MTU tests, concurrent connections tests and mapping test

  • Create a Probability Desnity function on the Port Mappings.

  • Create state transition function for when the "buckets" shift (stochastic function)

@synctext
Copy link
Member Author

synctext commented Mar 18, 2024

  • Blockchain Engineering master students are doing binary transfer one team and other advanced team {repo}
  • Weird story about the UDP.close() triggering a destructing of port mappings 💩
    • no birthday paradox if you close ports
    • clear protocol layering and decoupling violation
    • my advise: invest maximum 2h to wireshark this phenomenon
  • current 65k open socket server strategy is random
    • starts transmission from random socket
    • all other packets are also random
    • Randomness is essence of birthday collision. If other side has non-randomness, don't do a trivial linear sweep.
    • the reply strategy: off by default (only for time-out measurements)
    • future feature???: reply with 1 UDP response to measure random packet loss
    • record when "connected" if the IPv4 address changes of carrier-grade NAT. (test with 4G.reboot()??)
  • This sprint Probability density function of multiple SIM cards posted on this issue please. As you reported until some unknown condition is met. Try to find out what is happening or what are the state transition probabilities. and first thesis writing: why, what and how you are measuring.

update idea to use more external IPv4 addresses on your server. That means expanding your testing infrastructure with probing from multiple addresses. Can you start measuring for a while from 1 address and predict what the other address will see as port mapping? {hope this is understandable}.
If any stranger on the Internet can help you predict you port mapping (or not) you've made scientific progress. Both positive and negative outcome is progress and thesis material. thnx The following 5 IPs are assigned to your server: YYY.ZZZ.119.XXX :

@OrestisKan
Copy link

OrestisKan commented Apr 5, 2024

Currently gathered Belgian and Norwegian data for this week and fixed the bugs in the server that was causing it to crash. Updated the Paper with some changes on the measurements used and data gathered.

I believe right now there are good enough number of sim cards in my possession and I'll focus on analyzing the results of this sprint.

Todo:

  • Create a Probability Desnity function on the Port Mappings.
  • Create state transition function for when the "buckets" shift (stochastic function)
  • Make use of the multiple IP address! Measure for a while from one address and then try to see if I can predict what the other address ill see as a port mapping

First_5G_deployment_of_Distributed_Artificial_Intelligence.pdf

Planning to charter a private Piper Aircraft soon to do a sim card run in another EU member state

@synctext
Copy link
Member Author

synctext commented Apr 5, 2024

  • Your thesis does not mention why symmetric NATs exist
  • Good start in Problem Description of "we want distributed machine learning, hence we do ultra low-level networking stuff". You need a more structured storyline:
    • strong focused opening like: "We conducted successful birthday attacks on the Vodafone 5G network to enable pure peer-to-peer based decentralised learning. Our motivation is to empower users to freely use generative AI, without Big Tech, clouds, or servers in general."
    • We aim to give users democratic strategic autonomy
    • Machine learning is essential, but without cloud please. No privacy loss please.
    • We can't yet make heavy machine learning based apps such as TikTok or Youtube with user privacy, autonomy, and essentially cloud-free.
    • Hence we need freedom to communication in 4G/5G, form a cloud-free phone-to-phone overlay, and decentralised federated machine learning.
    • "5G overlay infrastructure for decentralised learning" 💥 😮 👏
    • So no 4 types of NATs according to Huawei in the first pages!
  • How could we use a blog post on a high-traffic nerd website for your thesis?
    • point to your cool stuff (requirement: thesis draft .PDF)
    • point to Android .APK on apps store for broad volunteer testing
    • crawl results for thesis graphs
    • make figures && make thesis.pdf && defend && DONE 🏁
    • Actually, this is no small task to deploy, 5G measure, and crawl. Like a secondary thesis project task
  • (thesis writing style) Preliminary results indicate for 16 Sim-Sim card pairs we obtained an average birthday paradox attack successs rate of 10-20% with one outlier of 40% success with 10 tries (e.g. stats caveats). We are investigating the low probability of success and aim to transition from brute force methodology towards a biased attacks. We are currently collecting more data from our 17 procured SIM cards. 1 week planned for integration of decentralised k-means clustering based ML.
  • Script can not yet exploit the multiple server IPv4 addresses for 5G measurements

@OrestisKan
Copy link

OrestisKan commented Apr 21, 2024

Updates last sprint:

  • Lebara
    Uses 256 queues of 256 ports each of incremental sequence numbers. The port number of the first port in each queue is always divisible by 256.

  • KPN
    It is the same as Lebara but with fewer users per NAT; thus, in 32.5% of queue assignments, a user is assigned a full queue. In almost 40% of the cases the initial port number of the queue (port number mod 256 is available).

  • Lyca
    Port numbers are randomly assigned, they follow no order, and there is no particular mapping and they span the whole port number space (after analyzing 0.5M mappings)

  • Vodaphone:
    Seems to be randomly assigned but spans only the first 15k port numbers with some bias towards the middle numbers of the space. Some port numbers are used for multiple connections. Requires more analyzing. Seems to be following a normal distribution

Updated thesis:

Next Sprint:

  • Discuss thesis committee and schedule defence day, since I'll need to book tickets for it + Summer will make it harder to schedule a day that suits everyone/
  • Fix problem description
  • Finish analyzing behaviour of NATs (Vodaphone + other European ones, + figure out the policy of how you are assigned in a queue for Lebara and + KPN)
  • Use the insights from these analyses into a birthday attack and evaluate whether there are improvements and document them

@synctext
Copy link
Member Author

synctext commented Apr 24, 2024

  • great progress with thesis!
  • No "Fix problem description" please, first finish the experiments and thesis section
  • Work towards green light moment please 🍏
  • UK, Romania, Croatia SIM cards plans are ongoing 👏 (16 SIMs operational)
  • e-SIM buy on Internet for final big thesis numbers 🐎
  • 24h prediction cycle. Congestion causes increased prediction difficulty. Empty at night 🌃
  • passport requirement for enrolment!??
  • Apple might cause an explosion of on-device LLM research 😲
  • TABLE II: Nat Types of all the carriers tested and the location of the test
    • add the Norway 6 seconds and 300s Vodafone
    • Puncture difficulty: 👌 to 🔥 🔥 🔥 🔥 🔥 ?
    • try some MOD 256 math expression (like Bubblesort complexity or trust formula)
    • Example from prior Delft thesis work “Universal Trust Machine”, https://arxiv.org/abs/2301.06938: image
  • Student work of IPv8 on Android 3 MByte/sec binary transfer 💥

@OrestisKan
Copy link

Vodafone NL fitted on a beta distribution
vodaphone-betta-distribution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants
@synctext @OrestisKan and others