Skip to content

ePython memory when running with offload #7

@jungla88

Description

@jungla88

Hi,

I have a strange issue with an algorithm I'm writing.
Basically the code I have written works good and now I'm testing the memory limit input for data to know when cores/shared memory go out of memory.
So in this case I use a for loop (in Cpython) to increase my input data:

lineList=None			
define_on_device(lineList)

myMat=None			
define_on_device(myMat)

myVect=None
define_on_device(myVect)

myPattern1=None
define_on_device(myPattern1)

myPattern2=None
define_on_device(myPattern2)

@offload
def append(index,myID,myCore):
	from util import range
	import array
	from parallel import coreid
	if myCore==coreid():
		myMat[index][0]=myID
		for i in range(0,len(lineList)-1):
			myMat[index][i+1]=lineList[i]

@offload
def init_pattern(mySizeDataSet,myPattDim):
	import array
	import parallel
	from util import range
	if coreid()<=mySizeDataSet%(numcores()-1):
		myLength=(mySizeDataSet/(numcores()-1))+1
	else:
		myLength=mySizeDataSet/(numcores()-1)
	lineList=[0]*(myPattDim)
	myPattern1=[0]*(myPattDim)
	myPattern2=[0]*myPattDim
	myMat=array(myLength,len(lineList)+1)
	for i in range(0,myLength-1):
		for j in range(0,len(lineList)):
			myMat[i][j]=0

sizePatternStop=...
sizePattern in range(1,sizePatternStop):
...
   input=list(itertools.chain.from_iterable([[numpy.random.uniform(size=sizePattern).tolist()] for i in range(0,256)]))
   input1=[]
   index=range(0,len(input))
   input1= zip(index,input)
   ...
   arrayListNode=input1
   indexNode,patternNode=zip(*arrayListNode)

   lenDataSetNode=len(patternNode)
   pattern_dim=len(patternNode[0])

   init_pattern(lenDataSetNode,pattern_dim)

   cnt=0
   for i in range(0,len(patternNode),NUM_CORES):
	last=len(patternNode)-i
	if last>=NUM_CORES:
		targetCores=NUM_CORES
	else: targetCores=last
	for cores in range(targetCores):
		copy_to_device("lineList",patternNode[i+cores],target=[cores])
		patternID=float(indexNode[i+cores])
		append(cnt,patternID,cores)
	cnt+=1
  ...

The problem is that it starts and always complete a full iteration. But at some iteration(that are not so particular in my opinion), the "copy_to_device" has problems: or it block all and I must exit manually from the shell, or it stucks complaining that

Error from core 8: Too many array indexes in expression
or sometimes (in multicluster with mpi) it says core is out-of-memory for a given sizePattern. But if then manually set that values as the starting one, it complete the full iteration and gives the same error for the next one.

I thought It could be a wrong memory management when increase sizePattern: the old matrix(myMat and others) still in memory from a cycle to another and then prematurely end the memory.

Can somebody give a look to that please?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions