Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compiler: Improve data streaming backend #2390

Merged
merged 17 commits into from
Jun 28, 2024
Merged

Conversation

FabioLuporini
Copy link
Contributor

Yet another bunch of misc compiler tweaks and generalisations, mostly to better support lazy data streaming in PRO

Copy link

codecov bot commented Jun 19, 2024

Codecov Report

Attention: Patch coverage is 52.97619% with 79 lines in your changes missing coverage. Please review.

Project coverage is 86.69%. Comparing base (e37d6ff) to head (4d08839).

Files Patch % Lines
devito/operator/operator.py 43.95% 49 Missing and 2 partials ⚠️
tests/test_gpu_common.py 0.00% 7 Missing ⚠️
devito/passes/iet/instrument.py 20.00% 4 Missing ⚠️
devito/types/basic.py 66.66% 4 Missing ⚠️
devito/types/parallel.py 66.66% 3 Missing and 1 partial ⚠️
devito/mpi/halo_scheme.py 40.00% 3 Missing ⚠️
devito/ir/clusters/algorithms.py 66.66% 1 Missing and 1 partial ⚠️
devito/ir/stree/algorithms.py 0.00% 2 Missing ⚠️
devito/arch/archinfo.py 66.66% 1 Missing ⚠️
devito/types/dense.py 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2390      +/-   ##
==========================================
- Coverage   86.78%   86.69%   -0.09%     
==========================================
  Files         234      234              
  Lines       44271    44383     +112     
  Branches     8184     8216      +32     
==========================================
+ Hits        38419    38477      +58     
- Misses       5137     5185      +48     
- Partials      715      721       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -19,6 +19,8 @@
'get_cuda_path', 'get_hip_path', 'check_cuda_runtime', 'get_m1_llvm_path',
'Platform', 'Cpu64', 'Intel64', 'IntelSkylake', 'Amd', 'Arm', 'Power',
'Device', 'NvidiaDevice', 'AmdDevice', 'IntelDevice',
# Generic
'CPU64',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SO both CPU and Cpu ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CPU64 = Cpu64('cpu64')

in archinfo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agrred, CPU64 + Cpu64 seems like there shound't need both

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one is the class the other an instance I agree it's pretty horrible will check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, done CPU64 -> ANYCPU

@@ -939,21 +955,23 @@ def _emit_apply_profiling(self, args):
if metrics:
perf("Global performance: [%s]" % ', '.join(metrics))

# Same as above, but excluding the setup phase, e.g. the CPU-GPU
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the data transfer time from device to host is not included ?
is it reported somewhere else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdym ? this is just an additional line -- a line that does not include the setup phase

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will cross-check the logs again

devito/operator/operator.py Show resolved Hide resolved
devito/tools/abc.py Outdated Show resolved Hide resolved
metrics.append("%.2f s" % fround(v.time))
metrics.append("%.2f GPts/s" % fround(v.gpointss))

perf("Global performance <w/o setup>: [%s]" % ', '.join(metrics))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to run locally the PR and I do not see any diff in the log, should I ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should yes according to these changes

@@ -151,6 +151,22 @@ def __init__(self, a, b, c=4):

return cls(*args, **kwargs)

def _rreplace(self, subs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? this is just self._rebuild(**subs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not only was it pretty similar if not identical to _rebuild, but also -- I just noted, sorry -- it appears to be completely unused at this stage. Probably something I coded during development which I eventually managed to avoid using, hence I'm dropping it for good!

@@ -433,7 +433,7 @@ def callback(self, clusters, prefix, seen=None):

key = lambda i: i in prefix[:-1] or i in hs.loc_indices
ispace = c.ispace.project(key)
# HaloTOuch are not parallel
# HaloTouch's are not parallel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"HaloTouches"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@@ -975,21 +993,12 @@ def lower_perfentry(v):

if v.time <= 0.01:
# Trim down the output for very fast sections
name = "%s%s<>" % (k.name, rank)
name = "%s%s" % (k.name, rank)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you define this first thing in the function? Would save the repeated line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, OK

def _signature(self):
"""
The signature of an AbstractFunction is the set of fields that
makes it "compatible" with another AbstractFunction. The fact that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth defining "compatible" here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's encoded in the first part of the sentence

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what sense? The current docstring doesn't help me understand what it means for one AbstractFunction to be compatible with another in this context

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to improve it in my next PR

@FabioLuporini FabioLuporini merged commit 4e86b12 into master Jun 28, 2024
31 checks passed
@FabioLuporini FabioLuporini deleted the data-stream-improvs branch June 28, 2024 07:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants