Skip to content
Snippets Groups Projects
Commit 6f3bee8d authored by François Gindraud's avatar François Gindraud
Browse files

Session 7 (finale)

parent 39ae6d7c
Branches master
No related tags found
No related merge requests found
...@@ -14,4 +14,4 @@ Sessions : ...@@ -14,4 +14,4 @@ Sessions :
4. Unit testing. normalize example. Packaging & distributions concerns. 4. Unit testing. normalize example. Packaging & distributions concerns.
5. Practice with paddedtensor example. 5. Practice with paddedtensor example.
6. Numerical problems (floats) 6. Numerical problems (floats)
7. *performance concerns* 7. Performance 101, overview of hardware details
...@@ -546,6 +546,7 @@ Optimizing this part of the code has the most effect. ...@@ -546,6 +546,7 @@ Optimizing this part of the code has the most effect.
Use tools for the code language ! Use tools for the code language !
Using a C profiler in python will tell you which interpreter internal function costs more, not functions from the python code. Using a C profiler in python will tell you which interpreter internal function costs more, not functions from the python code.
Compiled languages with C compatibility : perf + processing (flamegraphs, ...) is very effective. Compiled languages with C compatibility : perf + processing (flamegraphs, ...) is very effective.
Python : cprofile.
## Optimize ## Optimize
Many strategies are very dependent on the code. Many strategies are very dependent on the code.
...@@ -568,9 +569,9 @@ Readability can be reduced in optimized code ...@@ -568,9 +569,9 @@ Readability can be reduced in optimized code
Hardware related : generally not possible in interpreted languages (lack of fine control). Hardware related : generally not possible in interpreted languages (lack of fine control).
Computer has many sub-components that can affect performance : Computer has many sub-components that can affect performance :
- cpu caches (data & instruction)
- memory connection to cpu - memory connection to cpu
- memory internal structure - memory internal structure
- cpu caches (data & instruction)
- TLB / MMU : virtual memory (mostly for security) - TLB / MMU : virtual memory (mostly for security)
- computing units number, organisation, timings : simple integer ALU (&,|,+...), complex integer (*,/,mod,...), floating point, vectorized - computing units number, organisation, timings : simple integer ALU (&,|,+...), complex integer (*,/,mod,...), floating point, vectorized
- superscalar paradigm : "asynchronous" fetch/decode/execute - superscalar paradigm : "asynchronous" fetch/decode/execute
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment