What is needed to configure an integration?

The only thing a MongoDB integration needs is a live MongoDB Server instance the user has access to, and a set of credentials. Specifically, the required arguments are:

Is there any limitation on databases and data types?

All MongoDB types will be mapped accordingly, depending on the projection mode. Additionally, individual fields which have a complex structure may be synchronized as objects, instead of any scalar type.

This is going to be further explained in the next section.

How does the stream replication work?

Two modes are supported: "Key Based" / "Incremental", and "Full Table".

"Full Table" completely wipes out any data in our platform the stream previously downloaded, and then the whole data from the stream's source collection is dumped into our platform. This mode is not recommended for big tables.

"Key Based" (also called "Incremental") requires a particular field to be selected as "key". That field is meant to be treated as a sequential or "sortable" field that increases by time (i.e. records added in the future will have never descending values in this field) and will be tracked across several extraction executions from the (same) integration's stream.

In the first run, the whole table data will be downloaded into our platform, and the maximum value for that tracked key (i.e. the one from the last record retrieved) is kept for future runs. This will be time-consuming (as time-consuming as Full Table replication, but done only once) and the source servers may undergo a lot of resources consumption, so these runs should be scheduled on what the customer deems as low-activity hourly ranges (e.g. nightly).

When another extraction is run later for this incremental stream, only a subset of the data will be retrieved: Those having values, in the chosen key-field, greater than or equal to the last stored value.

As an example: Let's say the source collection contains "sold product units", containing these fields (assuming that collection is used on a regular basis and always with the same underlying business logic hitting it):

Typically, this collection will add more values each day, each minute. The appropriate stream mode is key-based in this case, choosing the sold_at field as the key. The first extraction will download the whole data that exists so far in the source table, and keep the last, and thus greater, value in the sold_at field.