WO2013110816A2

WO2013110816A2 - Method of using a shared memory

Info

Publication number: WO2013110816A2
Application number: PCT/EP2013/051594
Authority: WO
Inventors: Yves Albrieux
Original assignee: Tymis
Priority date: 2012-01-27
Filing date: 2013-01-28
Publication date: 2013-08-01
Also published as: FR2986346A1; WO2013110816A3

Abstract

The present invention relates to a method of using, by a task, a memory shared between a plurality of data processing units connected by an application bus, said task being executed by one of the data processing units, the method being characterized in that it comprises steps of: (a) assigning by the application bus to the task of a triplet of resources comprising a semaphore, a shared memory area and a queue, the semaphore indicating a state of blockage of the task, the shared memory area being a partition of a first shared memory block, a descriptor of the shared memory area being stored in a second shared memory block, said second block being dedicated to the storage of descriptors of the first shared memory block; (b) when the application bus notes that said shared memory area assigned is free and/or that the task has passed to the head of the queue, modification of the semaphore so as to indicate a state of freeing of the task; (c) use by the data processing unit of said shared memory area assigned for the execution of the task; (d) freeing of the space of the second shared memory block allocated to the storage of the descriptor of the shared memory area. The invention also relates to a method of parallel execution of a computer process and a system for this purpose.

Description

METHOD OF USING A SHARED MEMORY

GENERAL TECHNICAL FIELD The present invention relates to the field of parallel architectures.

More specifically, it relates to a method of using a shared memory by several applications or several tasks. STATE OF THE ART

Parallel architectures have become the dominant paradigm of computer systems in recent years, with the simultaneous processing of a plurality of tasks multiplying performance.

However, one of the problems that parallelism poses is the management of memory space, in particular RAM. The first solution is that each computing unit has its own storage space (we speak of "distributed" memory). This system is, however, unnecessarily expensive - since it is necessary to provide in n copies an oversized size memory to avoid problems of overflow - and requires that a heavy communication system be provided between the memories.

Alternatively to the distributed memory, there is the so-called "shared" memory, that is to say, co-used by the different calculation units, which avoids the aforementioned drawbacks. Access to the same memory by two or more processes, however, requires some precautions. From a material point of view, it is necessary to forbid the reading of an area of memory if another task is in the process of rewriting it under pain of reading a curious mixture, and vice versa. The reading process is indeed faster than that of writing. Moreover, two simultaneous writings can not be authorized, otherwise we risk a situation of "competition", whose result is unpredictable. Only two simultaneous readings are not a problem.

We therefore see the need for a management system for access to shared memory between two or more tasks.

To solve this kind of problem, one possibility is to use locks, that is to say to be able to block, in a single instruction, all the processes trying to access a data until the lock is released. This technique slows down parallel execution, and is sometimes a source of bugs: if you have two tasks requiring two variables, and the first task locks the first variable while the second task locks the second variable, then both tasks will be indefinitely blocked in what is called a "deadly embrace", or "deadlock" in English.

Some programming methods called "non-blocking synchronization" try to avoid using these locks. They are nevertheless even more difficult to implement and require the establishment of very specific data structures.

Modern operating systems can get around the problem by carefully avoiding any interference from one process to the other. This control involves a drastic management of all the resources of a computer upstream of the memory. This however means that a task of a machine of a first type can not share a memory with a task of a machine of a second type without using a complex interface mechanism. It is said that memory management is "machine-dependent".

The only "machine-independent" and standardized mechanism that exists today is IPC (Inter-Process Communication). However, IPC finds its limit in the kernel configuration of the operating system. Indeed, in general IPC is limited to the management of 4096 blocks of maximum size of 32 MB, and 1656 message queues for a maximum of Linux distribution. This is insufficient to share the memories of more than ten machines interconnected by an application bus.

Moreover, there is an API (programming interface) reserved for scientific computing and HPC (Hight Performance Computing) called OpenMP which is of SPMD type (that is to say that OpenMP does not apply). than between the tasks of the same application. Although based on multi-threading, OpenMP has no solution for parallelism on a distributed architecture, and has no solution for automatic parallelization. She asks the programmer to take the necessary precautions concerning the synchronizations and possible fatal hugs. This API requires an add-in like Message Passing Interface (MPI) but which itself is dependent machine.

There is therefore a need for a new machine-independent method of secure management of shared memory that is much more efficient and flexible than known techniques.

PRESENTATION OF THE INVENTION The present invention aims to solve these difficulties by proposing a method of use by a task of a shared memory between a plurality of data processing units connected by an application bus, said task being executed by the user. one of the data processing units, the method being characterized in that it comprises steps of:

(a) Assignment by the application bus to the task of a resource triple comprising a semaphore, a shared memory area and a queue, the semaphore indicating a blocking status of the task, the shared memory area being a partition of a first shared memory block, a descriptor of the shared memory area being stored in a second shared memory block, said second block being dedicated to the storage of descriptors of the first shared memory block. (b) When the application bus finds that said assigned shared memory area is free and / or the task has moved to the top of the queue, changing the semaphore to indicate a release status of the task;

(c) use by the data processing unit of said assigned shared memory area for execution of the task;

(d) Freeing the space of the second shared memory block allocated for storing the descriptor of the shared memory area.

According to other advantageous and nonlimiting features of the invention:

The partition of the first block allocated as a shared memory area for the task is dynamically generated according to a memory space size requested by the data processing unit;

The first shared memory block is structured into a set of equal sized units called "pages" each referenced by an index, the partition of the shared memory assigned as the shared memory area for the task being a set of pages said descriptor of the shared memory area being a page table containing indexes of pages composing said partition;

A table of pages containing the indexes of the pages composing a partition is represented in the second block of memory by a linked list of pairs, each pair comprising a page index and the memory displacement to be made in the second block to reach the next pair of the linked list;

Step (a) comprises obtaining a triplet of identifiers from an IPC key, the resource triplet assigned to the task being identified by the triplet of identifiers;

· Each identifier is a unique integer for each resource type of the triplet; The task is associated with a first instruction and a second instruction, the first instruction being a non-blocking instruction of the execution of the task and the second instruction being an instruction blocking the return of the result of the execution of the task. task, step (a) being implemented following the launch of the first instruction, and step (d) being implemented following the launch of the second instruction.

According to a second aspect, the invention relates to a method of parallel execution of a computer process by a plurality of data processing units connected by an application bus, the process being written in the form of a sequence of tasks, each task being an elementary action executable by a given processing unit to which said application bus is connected, the tasks being ordered according to possible dependency constraints vis-à-vis other tasks, the parallel execution method respecting the order indicated by the scheduling table, the execution of each task comprising the implementation of the method of use by the task of a shared memory by said data processing units according to the first aspect of the 'invention.

According to other advantageous and nonlimiting features of the invention:

Performing a plurality of tasks (T) comprises implementing the method of use by the task of a shared memory by said data processing units according to the first aspect of the invention, the launching the second instruction of such a task (T) being done synchronously with the request to execute a second task (T2) having a dependence constraint vis-à-vis the first task (T). According to a third aspect, the invention relates to a system comprising a plurality of data processing units implementing the method according to the first or second aspect of the invention, at the least a memory connected to a data processing unit, and an input receiving the plurality of tasks to be executed.

PRESENTATION OF FIGURES

Other features and advantages of the present invention will appear on reading the following description of a preferred embodiment. This description will be given with reference to the appended drawings in which:

FIG. 1a represents an application bus architecture;

FIG. 1b represents an application bus terminal architecture;

FIG. 2 represents the transition from an IPC key to a triplet of IPC resource identifiers;

FIG. 3 represents an example of a first block illustrating the phenomenon of fragmentation;

FIG. 4a represents an example of a first paginated block and the second associated block;

FIG. 4b illustrates the compacting problem in the second block of FIG. 4a;

- Figure 5 shows the request / response mechanism on an automatically synchronized channel.

DETAILED DESCRIPTION OF AN EMBODIMENT

Application bus architecture

With reference to the figures, the method of use by a task of a shared memory according to the invention is via an application bus. Similar to ESBs (enterprise service bus), this bus is primarily allow the communication of applications that basically are not designed to work together.

The architecture of the Business Bus (BA) is physically distributed over a constellation of machines. It advantageously comprises the following software elements:

• The Application Bus Core (NBA) ideally hosted by a dedicated machine (it is not mandatory for small and medium configurations), responsible for dispatching and scheduling orders and exchanges of data according to priorities, · a Application Bus Multiplexer (MBA) per operating system (in other words, host machine), managing the shared resources (in particular the processing unit or units, the memory space, etc.),

• An Application Bus Terminal (TBA) per application, this terminal can be a Master Terminal (Tm) if the application is director or a Terminal Slave (Ts) if the application is controlled.

As shown in FIG. 1a, an application bus may be interconnected to one or more other buses via their respective core (NBA). A machine constellation managed by a kernel-less BA can not be linked to another constellation. Connections are for example implemented under standard protocol (TCP / IP, ...) advantageously via ports easily accepted by firewalls, domain servers and conventional gateways, under highly secure procedure.

At the machine level, the BA can only be accessed by an application via a TBA connected to the local MBA, as shown in Figure 1b.

Each application can launch threads and ask the BA to also provide them with a dedicated terminal which allows two threads to interact.

From a practical point of view each terminal is able to operate in parallel requests-responses on different channels. A dedicated "message pump" allows this terminal to warn of the arrival of data on a particular channel from another terminal. Recent applications and / or adapted to BA ("native" applications) can communicate with him live at TBA level. Alternatively, older applications that do not know this BA are advantageously supported ("piloted" applications) by specific interfaces of each of these applications, responsible for offering them the specific adaptation allowing them to integrate a TBA: the CIA (interactive communication by PLC). A CIA is a kind of host driver with the resources needed for applications connected to bus terminals.

Each TBA has a plurality of communication channels with its application, through which channels are made the requests for execution of a task and the withdrawal of responses. For each type of request / response, we associate a single channel. In this way, several exchanges can take place in parallel since the channels will be completely independent.

The BA thus offers the possibility of massively parallel processing of any process whose execution is required in a multi-core multi-machine environment regardless of the operating systems.

At the local level of each machine, communication and data exchange between the different processes is based on the IPC standard.

IPC Resources

The IPC solution for the secure sharing of information requires a triplet of "descriptors" of the task:

• A memory proper (in other words a "block"), of any length,

• An access request device composed of a message queue (where the requesting tasks come to "take their turn" in the queue),

• An access authorization and synchronization device: "a semaphore". The operation of a triplet described above is as follows: the task wishing to use a memory deposits its request in the corresponding queue. It specifies the action (read / write) and the protection semaphore, a kind of red / green light that allows to know if the memory can be read / written. The task is blocked by this semaphore and is unblocked by the BA that releases the semaphore only when the memory is ready to be used and / or it is the turn of the said task to access it.

If IPC is currently insufficient to manage more than a dozen machines, it is because current methods of using memory use a triplet per channel.

Assuming a constellation of 10 Linux machines with an average of 4 TBA each offering 40 channels, we obtain a need for 1600 memory blocks and as many semaphores and queues. But these are the latter which pose a problem, since the maximum limit of message queues is 1649. We also note that more than a third of the maximum number (4096) of memory blocks are already used, whereas in the majority of cases only one tiny fraction of their allocated space is actually exploited.

Of course, the limits shown are adjustable for all systems: you can modify these values and recompile the OS core. But this manipulation and the dangers it presents leave the users who are very rarely specialists in operating systems and much more concerned about their business process.

The method according to the invention aims to significantly increase the number of machines manageable by IPC without exceeding these limits. The idea is that all the channels of a TBA share only a few messages, or even one, and especially share the same single large memory comprising a set of areas each relating to a channel. The limit in terms of semaphores (32000) poses no problem as the semaphores are reusable.

Thus, constellations of several hundred machines become possible. The other big difference from traditional solutions is that the programmer no longer has to worry about sockets, synchronization or exchange management. The MBA takes care of this.

Large shared memory structure

The fact of having a single queue by TBA does not pose big problems: it is enough that the BA has instruction to immediately process any request from a task, so to empty the queue (which is FIFO type) as quickly as possible.

In contrast, the "multi-channel" management of memory is more complex.

The Applicant has noticed that the ratio between the maximum number of message queues and that of the number of shared memory blocks is greater than two. This makes it possible to associate with each queue (and therefore with each TBA) two memory blocks.

To be able to implement this method, the shared memory associated with a TBA is composed of two parts, a part dedicated to the storage of the data of the tasks strictly speaking (data on which the processing units act during the execution of a task), and a part dedicated to the storage of descriptors of the structure of the first part.

In other words, the first block associated with the TBA is dedicated to the storage of the data of the tasks and it is called "block of data", and the second block associated with the TBA is dedicated to the storage of the descriptors, and is called "Descriptive block".

Thus, instead of associating a complete IPC memory block with a channel of the TBA, a partition of the first block is associated, the data making it possible to identify this partition (the "descriptor" of this partition) being stored in the second block. A method of using shared memory by a task

The first part of the method according to the invention thus consists in the allocation by the application bus to the task of a resource triplet (IPC) comprising a semaphore, a shared memory area and a queue, the semaphore indicating a blocking state of the task, the shared memory area being a partition of a first shared memory block, a descriptor of the shared memory area being stored in a second shared memory block, said second block being dedicated to storage descriptors of the first shared memory block.

When the application bus finds that said allocated shared memory area is free (ie that a previous task no longer needs it) and / or that the task has moved to the top of the queue, the semaphore is modified so as to indicate a state of release of the task.

The memory area is then available for use by the data processing unit assigned for the execution of the task, without the risk that this use is disturbed by another task. It should be noted that when the memory is available, it is not necessarily immediately used. On the other hand, once the completion of the task has been completed, the answer may be requested a little later. Preparing "in advance" the uses of memory is thus a force of the method according to the invention. An effective acceleration is obtained. This will be detailed further in the following description.

The final step of the method consists in the release of the space of the second shared memory block allocated to the storage of the descriptor of the shared memory area, in order to be assigned again to the storage of the descriptor of the shared memory zone which will be used for another task. Assigning an IPC Resource Triplet

The management of all IPC resources should preferably be the responsibility of a single entity. This ensures seamless management by a single creative process. This role is vested in the MBA, the BA's multiplexer.

In order for a process to obtain an IPC resource for each of its tasks, it must request it from the MBA via a well-defined protocol designed to optimize the use of IPC resources.

To this end, the MBA assigns the resource triplet of step (a). It is said that the process will be housed (login in English).

At the end of this housing request the process will be either rejected or accepted. If it is accepted (successful login) the MBA will put at its disposal a TBA master (Tm) triplet.

The Tm will exploit these minimal resources to present an extensive collection of logical channels of exchange.

Moreover, any master process hosted on the MBA can initiate other so-called slave processes which themselves may require a Slave Terminal (Te) in the image of the Tm. The disappearance of the master indicates to the MBA that it can recover the resources allocated to the master + slave group.

An application is housed on the BA (all machines linked by their MBA). The applications communicate with each other thanks to the BA which manages all the shared resources.

Moreover, the three IPC resources are advantageously identified by a single integer within the operating system considered. This identifier is unique for each type of IPC resource (semaphore, shared memory, message queue). It can be reused from one type to another.

There are three different ways to generate an identifier: Fixed identifier

The easiest way for many processes to share an ID is to fix it. However, this solution is not recommended since there is no guarantee that this identifier will not be used by a foreign process.

Identifier calculated by a private key

It is possible to ask the system to calculate an identifier with a key. To create a key in private mode, it is necessary to pass the IPC_PRIVATE flag as the first argument of a function specific to the IPC resource type and requesting a numeric key as a second argument. In this case, the operating system automatically generates the identifier and guarantees its exclusivity.

Nevertheless, the sharing of this resource remains feasible if the creator process makes available to other processes the identifier of this resource.

Identifier calculated by a dynamic key

If instead of using the IPC_PRIVATE flag we give a key obtained by a known means of all processes to share the resource, each process can then calculate the identifier. This is the solution advantageously preferred by the invention.

On Linux, there is the ftok () function that allows to dynamically generate a free IPC key based on two parameters: an existing file name (often the name of the project or module), and an ASCII character allowing to singularize a key the same module: for the same file name, two different keys will be generated this character is different. This function brings Windows (which only works with file names) closer to Linux.

Whether on Linux or Windows, creating a resource

IPC requires passing through the following two basic steps: Obtaining identifier from a key (represented for the triplet of identifiers in FIG. 2)

Creation of the IPC resource and obtaining a reference of the resource.

The obtaining of an identifier is for example carried out by a system call offered by the Posix IPC library (semget, shmget, msgget) or Windows (CreateSemaphore, MapViewOfFile, CreateMailSIot).

Based on the identifier generated in the previous step, the IPC resource is created through a second system function reserved for this purpose.

Partitioning policies of the first block

The method according to the invention proposes two different policies for partitioning memory space:

Fixed partitioning

This method consists in dividing the memory space into several partitions of the same size or different size. The size and number of partitions are fixed in advance and the partitioning is done at the start of the operating system.

This method lacks a lot of flexibility and causes internal fragmentation. In fact, the sizes of the memories requested rarely correspond to the sizes of the fixed partitions, and the difference constitutes a non-usable memory space. Moreover, if the requested size exceeds that of the partitions, operations of replacement of memory areas will be possible (overlaying).

Dynamic partitioning

Preferably, the partition of the first block allocated as a shared memory area for the task is dynamically generated according to a memory space size requested by the data processing unit.

Dynamic partitioning thus consists of creating memory zones that vary dynamically. The allocation of the memory is done according to the requests of reservation, of release but also of the available memory size. This allocation policy creates what is called external fragmentation, which is due to unused partitions (internal fragmentation is the undefined free area of a partition, while external fragmentation is unassigned partitions). Nevertheless, the impact of internal fragmentation is significantly reduced. Figure 3 illustrates the phenomenon of fragmentation (internal / external).

It is always possible to gather all unused spaces into a single partition, this technique is known as compaction memory. Memory compaction is strongly discouraged because it monopolizes at once: the CPU, the RAM and the system bus during data transfer operations.

Memory allocation policies for dynamic partitioning

When several free partitions are available, choose the one that best corresponds to the request, if not, create a new one. Here are some examples of allocation algorithms: First Fit

We search in the list of free partitions the first partition large enough to contain the requested space. This algorithm is both fast and simple to implement but causes a lot of external fragmentation because the blocks at the end of the list are difficult to access. Next Fit

This algorithm is very similar to the one previously described except that the search starts from the position of the last selected partition.

Best Fit

We are looking for the partition whose size is closest to the size of the requested space. This algorithm tends to preserve large partitions in case they can be exploited later. The disadvantage of this algorithm is that the memory will be split into several small non-exploitable areas.

Worst Fit

We are looking for the largest free partition to have the largest unused space. This space will be used to create new partitions.

Buddy System

This algorithm considers that the memory partitions have a size of 2 ⁿ . When there are no more partions of size 2 ^x subdivide a partition of size 2 ^{X + 1} , if there are more partitions of size 2 ^{X + 1} we cut a partition of size 2 ^{X + 2} and so on. When two adjacent partitions of the same size are released, they will be grouped into a single partition of twice the size. This algorithm is fast and constitutes a good compromise between the advantages and disadvantages of the algorithms described above. However, having to round the size of partitions to a power of 2 causes some internal fragmentation. Table 1 below describes the flow of an example of a memory allocation sequence: Total size I MB

Application A = 1 1 1k 126k 128k 256k 512k

Application B = 230k 128k Î28k 256k 512k

Application C = 52K 128k 64k 64k 256k 512k

Request D = 256k 128k 64k 64k 256k 256k 256k release B 128k 64k 64k 256k 256k 256k

Release At 128k 64k 64k 256k 256k 256k

Application E = 80k 128k 64k 64k 256k 256k 256k

Release C 128k 128k 256k 256k 256k

Release E 512k 256k 256k release D IMo

Table 1

Non-contiguous allowance

The algorithms seen so far consider partitions as a continuous space. These algorithms generate a lot of internal or external fragmentation. To remedy this problem, the first shared memory block is advantageously structured in a set of units of equal size called "pages" each referenced by an index, the partition of the shared memory allocated as a shared memory area for the task being a set of pages, said descriptor of the shared memory area being a page table containing indexes of pages composing said partition.

In other words, the allocation spaces are considered to be a non-contiguous space, this is called pagination.

The principle of pagination is to structure the memory space into a set of equal sized units called pages. The memory space allocated to a partition will then consist of a set of pages that are not necessarily successive. A partition is no longer identified by its starting address and its size, but by all the pages that compose it, these pages can be organized as a table called a page table that describes the reference to the partition. Figure 4a shows a first block organized in pages and the second associated block.

The principle of pagination is often combined with other notions such as segmentation (which allows to group pages tables) or swaping (use of disk space for loading and unloading pages). These techniques are the building blocks of the concept of virtual memory that allows the execution of programs whose size exceeds that of the actual memory. In general, memory space is often poorly exploited in contiguous approaches. Indeed, problems of internal / external fragmentation are very difficult to circumvent, there is always a compromise between the efficiency of the deployed algorithm and the rate of internal / external fragmentation. There is no ideal algorithm as long as we place ourselves in a highly dynamic system.

Non-contiguous (page-based) allocation approaches allow good use of address space. When talking about virtual memory, paging also allows:

• Loading on demand. Pages are only loaded when they are referenced.

• Extend the real (physical) address space.

• Transparency. The user no longer has to explicitly manage his partitions.

Pagination completely eliminates the problem of external fragmentation since all pages are the same size. However, the internal fragmentation problem may occur in the last page of a partition if it is not fully populated.

Structure of the second block

In order to be able to evaluate the performances of our system, the size of the pages is a parameterizable criterion. In practical terms, the size of pages is defined according to the size of data exchanged (data structures). The smaller the partition sizes, the more partitions there will be and therefore the page tables will gain volume. You always have to find the right balance between the page size and the overall size of the first IPC memory block.

The navigation in a shared memory block requires a preliminary knowledge of its composition: number of available partitions, size of the memory pages, pages allocated to the allocated partitions, etc. This part is the description of a memory block, it can be either integrated directly into the same first block (data block), or integrated into the second block (descriptive block). Each partition in the description block corresponds as explained to a logical channel.

But as can be seen in FIG. 4b (which uses the example of FIG. 4a in case of deletion of the descriptor from the memory zone of the first block relating to channel 1), the implementation of step (d) ) requires a fairly heavy memory compaction if the page table of a partition of the first block is itself a contiguous partition of the second block.

The descriptive block is advantageously structured in such a way as to avoid as much as possible the problems related to compaction of the memory, internal and external fragmentation. The offset of the memory areas is avoided as much as this operation is very expensive in CPU time. To achieve this, a table of pages containing the indexes of the pages composing a partition is for example represented in the second memory block by a linked list of pairs, each pair comprising a page index and the memory displacement (or "offset") to perform in the second block to reach the next pair in the linked list.

The page table of a channel is no longer represented by a contiguous partition. All update operation on the partition is feasible by a simple modification of the links of chaining. No memory compaction will be necessary. P-lnstructions

High-performance methods for splitting a process into a plurality of tasks and scheduling them for parallel execution by the different data processing units (processors, cores, etc.) are known. For example, patent applications FR 2963125 and FR 2963126 are referred to.

As explained, the available memory of a task can be used only much later, in this case when the processing unit actually needs the result of the task: at the time of synchronization. The use and then the release of the memory can be seen as a "response" after a "request".

Thus, a pair of instructions, called "P-instructions", is advantageously associated with the task, the first instruction being a non-blocking instruction to request the execution of the task (the "request") and the second instruction. being a blocking instruction returning the result of the execution of the task (the "response"), the step (a) being implemented following the launching of the first instruction, and the step (d) being implemented following the launch of the second instruction.

If we have a written process in the form of a sequence of tasks, each task being an elementary action executable by a given processing unit to which said application bus is connected, the tasks being ordered according to possible dependency constraints for other tasks, it is possible to execute the process in a parallel way provided that the scheduling order is respected for the execution of the tasks.

According to a second aspect, the invention thus proposes a method for parallel execution of a computer process by a plurality of data processing units connected by an application bus, the process being written in the form of a sequence of tasks, each task being an elementary action executable by a processing unit given to which said application bus is connected, the tasks being ordered according to possible dependency constraints vis-à-vis other tasks, the parallel execution method respecting the order indicated by the scheduling table, the execution of each task comprising implementing the method of use by the task of a memory shared by said data processing units according to the first aspect of the invention.

The P-instructions are particularly suitable for controlling the sequence of tasks, since it is sufficient to obtain automatic synchronization that the launch of the second instruction of a task (T) is done synchronously with the request. execution (the first instruction in the case of a P-instruction) of a second task (T2) having a dependence constraint vis-à-vis the first task (T).

systems

According to a third aspect, the invention proposes systems that can implement the methods according to the first or second aspect of the invention.

The first functionality requested from the application bus is to ensure the routing of all data between all the actors of all the interconnected configurations.

All the actors can be gathered in a single station, the processing unit of the application bus being that of the station. In this case, the system according to the invention comprises the workstation, the latter comprising first of all data display means and data acquisition means. It can be classically a screen and a keyboard with a mouse. This material is simply used to implement one or more human-machine interfaces allowing a user to interact with the BA, possibly by providing processes to execute. The system according to the invention also comprises a data processing unit, which is the unit connected to the application bus, and a memory. For the system to be of interest, the processing unit must preferably be a multicore processor (in this case it will be understood as a plurality of processing units), that is to say a processor that can take advantage parallel execution. The memory is advantageously a single shared memory.

This workstation hosts the NBA if there is one.

Alternatively the system according to the invention may not be content with a single workstation, but understand as explained above at least one partner machine. In addition, there may be several workstations controlled by users, the different stations each having a processing unit and using the same partner machines around a single application bus.

The application bus serves as multiple stations (users) and multiple partners (PLCs) in MSMP configuration. It operates in various configurations called "degraded" Simple / Multiple Station / Partner SSSP, SSMP, MSSP and MSMP. Finally, the application bus is able, as far as possible, to support the links of the real-time type required for certain devices (peripherals, control of production lines, etc.). It thus provides a bridge between the industrial world and the office world for example to obtain a real-time picture of the production.

The bus uses the connections and the existing physical media between the usual types of machines (file server, web server, application server, etc.). In particular, two TCP / IP ports can be reserved for it (12 and 14 or 3012 and 3014) for its maximum throughput and optimized communications. The communications are multiplexed and each segment has a priority. The task synchronization messages are the highest priorities as well as any real-time links.

The scheduling operations of the application bus can be optionally processed by a calculator, ideally vector, which is a processing unit of one of the partner machines, which can be dedicated. On average configurations, a graphics card located on a partner is used as GPGPU (General-Purpose Computation on Graphics Processing Units). Indeed a graphics card is a complete vector calculator.

Being distributed a priori, the application bus preferably operates in a secure environment. The multiplexer / demultiplexer is thus provided with a strong encryption function. The system used is random key encryption of variable lengths and automatically refreshed. Thus any attack is thwarted by a key change in a shorter time than that requested by the key search. There is a key by port and by direction of transmission. The transmission of the new randomly calculated key is itself secured by the previous key. Thus, even if the initial key (used once during the login) is known, there can be no question of penetrating further exchanges.

The strength of the whole is subject to a quality objective that makes the application bus performs particularly advantageously a tracing of its operations and transactions. This tracing is of course not a priority and does not impact the performance of the whole. It is treated as an acquisition of data stored in cache and then saved in times of lower priority. An ancillary tool can make it possible to use these data on demand and at leisure to search the history of any event or extract any useful statistics, or to allow more precise adjustment of the parameters of the application bus so as to optimize the operation for a given configuration.

Claims

A method of using by a task a shared memory between a plurality of data processing units connected by an application bus, said task being executed by one of the data processing units, the method being characterized by what it includes steps of:

(a) Assignment by the application bus to the task of a resource triple comprising a semaphore, a shared memory area and a queue, the semaphore indicating a blocking state of the task, the shared memory area being a partition of a first shared memory block, a descriptor of the shared memory area being stored in a second shared memory block, said second block being dedicated to the storage of descriptors of the first shared memory block.

(b) When the application bus finds that said assigned shared memory area is free and / or the task has moved to the top of the queue, changing the semaphore to indicate a release status of the task;

The method of claim 1, wherein the partition of the assigned first block as a shared memory area for the task is dynamically generated based on a memory space size requested by the data processing unit.

The method of claim 2, wherein the first shared memory block is structured into a set of size units. called "pages" each referenced by an index, the shared memory partition assigned as shared memory area for the task being a set of pages, said descriptor of the shared memory area being a page table containing the indices pages composing the partition.

The method of claim 3, wherein a page table containing indexes of pages composing a partition is represented in the second memory block by a linked list of pairs, each pair comprising a page index and the memory movement to be performed. in the second block to reach the next pair in the linked list.

5. Method according to one of the preceding claims, wherein step (a) comprises obtaining a triplet of identifiers from an IPC key, the resource triplet assigned to the task being identified by the triplet of identifiers.

The method of claim 5, wherein each identifier is a unique integer for each resource type of the triplet.

7. Method according to one of the preceding claims, wherein the task is associated with a first instruction and a second instruction, the first instruction being a non-blocking instruction to request the execution of the task and the second instruction being a blocking instruction for returning the result of the execution of the task, step (a) being implemented following the launching of the first instruction, and step (d) being implemented following the launching of the second instruction .

8. A method of parallel execution of a computer process by a plurality of data processing units connected by an application bus, the process being written in the form of a sequence of tasks, each task being an elementary action executable by a given processing unit to which said application bus is connected, the tasks being ordered according to possible dependency constraints vis-à-vis other tasks, the parallel execution method respecting the order indicated by the table d scheduling, the execution of each task comprising the implementation of the method of use by the task of a memory shared by said data processing units according to one of claims 1 to 7.

The method of claim 8, wherein executing a plurality of tasks (T) comprises implementing the method of using by the task a memory shared by said data processing units according to claim 7, the initiation of the second instruction of such a task (T) being done synchronously with the request for execution of a second task (T2) having a dependency constraint on the first task (T).

10. System comprising a plurality of data processing units implementing the method of use by a task of a shared memory according to one of claims 1 to 7 or the method of parallel execution of a computer process according to one of claims 8 or 9, at least one memory connected to a data processing unit, and an input receiving the plurality of tasks to be executed.