Drupal is one of the most consumed and powerful content management systems in the world. However, importing content with complex structures was a task I found to be extremely frustrating as a new developer. Finding the most reliable way to import data into a Drupal system was challenging and unnecessarily difficult. One of the befitting examples would certainly be the Migrate module tool that Drupal 8 provides.
Migration plugins are responsible for mapping the data extracted from a source to the Drupal definition of that data. When I started a methodical approach to perform the migration, I first learned that there is a need to create a custom migration plugin to import data. This is done using a YAML file where we have to mention the source, destination and process plugins. The procedure of migration is straightforward for content types such as 'Articles,' but is far more complex for structures like Custom Entities. After learning how to write this script, I had to test it by loading the configuration and running the migration. I realized that for every change I made in the script, I have to load the configuration once again and execute it. After many trials and errors, I managed to get the migration correct for a simple Content Type with three fields. But, when I had to perform migration for a Custom Entity with 30 fields, this task seemed impossible to complete within a realistic time frame. This challenge encouraged me to walk on a trail with no footprints and led me to devise two new approaches for performing the complete data import.
The first approach is using the Talend Data Integration tool. Talend is a versatile ETL tool which provides the ability to make direct connections with the Drupal MySQL Database.
I used this platform instead of a Drupal module to import data into the system. In order to do this, it is very important to understand the purpose of all the tables in the Drupal database and their relationships with each other. For instance, in Drupal 8, if we are importing data of content type "Node", Drupal inserts data into tables such as node, node_revision, node__body, node_field_data, node__access and many more. The official Drupal website has an Entity Reference diagram of the Drupal database and is a good resource to learn the structure of it.
Every content type that we create gets stored in the node and other tables in the database.
The node table has the following columns (nid, vid, type, uuid, langcode):
node_revision has columns like nid, vid, timestamp, revision log, etc. to keep track of every version of a node that has been saved from the Front End. The revision log stores a message to know changes for a vid. This can be disabled from a checkbox option called 'Create new revision.'
node_revision_body, node_revision comments are just revision tables for their respective tables. For every node(nid), version(vid) and type(bundle), it stores the default field 'title' and other details like timestamp, etc. nid, vid, type, langcode, status, uid, title, created, changed, promote, sticky, default_langcode, revision_translation_affected
node_field_image, node_field_tags store data for image and tags field for every node
node__body stores body information for each node whose columns are (bundle, deleted, entity_id, revision_id, langcode, delta, body_value, body_summary, body_format).
node__access stores data about access for all nodes.
The following diagram will give a rough idea as to how a node gets stored in the Drupal database. Consider a 'Customer' content type with 4 fields ' Title, Age, Contract Date and Discount. Every time a Customer content type is added changes are reflected in the following tables -
node
node__body
node_field_data
node__field_age
node__field_contract_date
node__field_discount
And all corresponding revision tables.
These are the steps for the first approach.
Some people may argue that Approach 1 unnecessarily uses another platform when Drupal already has multiple ways to support data importation. My counterargument is that we leave Drupal out of the process and directly load data into the MySQL tables, resulting in utmost certainty that the data is now in our system. However, if your data import has too many fields, it would require Insert action into many more tables.
A sophisticated way of handling such a situation is by using the REST API configured in Drupal and using an external platform like Talend to make the request calls. This is the second approach I followed, where I used a universal way to talk to the system ' the Restful Web Service.
This approach is easily scalable and the same steps need to be followed for a high number of fields for content type.
The Routine Libraries can be added by going to Code > Routines > Create a New Folder > Create a New Routine > Right Click > Edit Routine
Libraries > Browse > Add the Library File > Finish.
5. This is a sample Java routine if we were to add only the title and body of a content type.
First, import the okhttp library and create the object for the client. This client acts the same as one in Postman to send the Request object and captures the response in the Response object.
6. Before creating the routine in Java, you can also construct the call in Postman client to study and test the POST request. This step assured me that the construction of my REST call was correct and hence, I could confidently proceed with importing thousands of records into the system. This is how it would look like in Postman to import just Title and Body of a content type.
7. If you don't know how to get the encoded value for Authorization, you can simply go to Postman > Authorization > Select Type as Basic Auth > Enter Username and Password.
This automatically generates the encoded value in the Header and can be copy-pasted into the Java routine we use.
8. After making the Java routine in Talend, read data from the CSV file using the component, tFIleInputDelimited similar to Approach 1, and join it to a component called tJavaFlex which allows you to call a Java function. In tJavaFlex, every field value is taken and dynamically passed to the REST body.
The link row2 has all the field names that we read in the tFileInputDelimited component which are used to pass to the Java function.
9. After running this job, for every record in the datasheet, a REST call is made and data is sent to the system.
This method eliminates the hassle of maintaining complex table relationships in the database. These approaches have the same procedure to import Node, User, Taxonomy and Custom Entity. They eliminate the tasks of keeping up with the latest module versions or fixing other compatibility issues and are reliable and easy ways to import data into a Drupal system.
Finally, let's compare each data import approach side by side.
Database Connection Approach | REST Approach |
Very fast - Thousands of records will take a couple of seconds to be imported in multiple tables. | Slower than the first approach. May take 10-15 minutes in all to send each record one at a time using REST for a few thousand records. |
For a Content Type, with 4 fields, making the job will take around 1-2 hrs and execution a couple of seconds for a few hundred records. | For a Content Type, with 4 fields, making the job will take around 30-45 mins and execution a couple of minutes for a few hundred records. |
Not very scalable. As data becomes bigger, the number of table connections increases. | Very Scalable â We simply add more fields in the REST body. |
More scope of error.As there is manual mapping involved, this method will fail if not done carefully. | Very less scope of error â As there is only a Java routine involved, this method will work successfully if POST request construction is done right. |
Not very flexible. Most of the approach follows a fixed way of doing the import. | Flexible. We can use any format of data, REST library, programming language, platform other than Talend. |
Can be used to import Node, User, Taxonomy and Custom Entity. | It may not be able to Configure REST API for Taxonomy, Users, etc. |
The Talend Job in this approach isn't easily reusable. If Content-Type is changed, all the field table connections need to be changed which may take time. | The REST approach is easily reusable. The URL, Headers, Method will be the same for the same website, only the body of the request will need changes for a different content type. |
One of the greatest strengths of an open source CMS like Drupal is that there is often no single 'right' way to solve a problem. These are just a few of the methods developers can use to import content and data in Drupal.
For more on how Fantail Consulting and Technologies uses open source software solutions, like Drupal, visit our website or check us out our profile on Drupal.org.
If this content did not answer your questions, try searching or contacting our support team for further assistance.
Fri Sep 12 2025 08:32:32 GMT+0000 (Coordinated Universal Time)