• Skip to primary navigation
  • Skip to main content
  • Skip to footer

Codemotion Magazine

We code the future. Together

  • Discover
    • Events
    • Community
    • Partners
    • Become a partner
    • Hackathons
  • Magazine
    • Backend
    • Frontend
    • AI/ML
    • DevOps
    • Dev Life
    • Soft Skills
    • Infographics
  • Talent
    • Discover Talent
    • Jobs
    • Manifesto
  • Companies
  • For Business
    • EN
    • IT
    • ES
  • Sign in
ads

Leo SorgeMay 12, 2021

Watson Studio: IBM Extends Open Source to the Production Phase to All Stakeholders

Data Science
Watson Studio IBM Extends Open Source to the Production Phase to All Stakeholders
facebooktwitterlinkedinreddit
Table Of Contents
  1. Is Open Source ideal for AI use in business?
    • We need to make a change now
  2. The IBM AI ladder is a multi-step information architecture
  3. Data and digital transformation
  4. Cloud Pak for Data as a Service 
  5. Architecture of Cloud Pak for Data as a Service
  6. Watson Studio Tools
  7. Let’s subscribe to Watson Studio!
  8. Starting a new project in CP4Daas
  9. Data upload
  10. Data analysis
  11. Bringing models into production
  12. Conclusions
  13. Let’s dive into code!

Is Open Source ideal for AI use in business?

By the time you leave university, you will already be using many tools and platforms from the open-source environment. Open Source offers an infinite number of tools, essentially free of cost. Communities usually support these tools with many potentials, so your future development needs are fully covered.

This is good for research or study purposes, but the number of tools and platforms becomes much lower if you move into production.

Recommended article
March 3, 2025

A Performance comparison of quick sort algorithms 

Noa Shtang

Noa Shtang

Data Science

The cost-free aspect of the platform is not the most important feature when you reach the production phase. At that point, you begin to understand that there is a gap between the learning universe and the working universe, but it may still seem that your open-source environment should be enough for all the phases of production.

Why make a change from something that already works, and that’s open, to a closed environment you don’t know yet?

There are many good reasons to make this change. In production, you need security, governance, and integration to establish your professionalism with the customer. Your company needs to manage data – at least, many companies do, though few use data properly.

Appropriate tools are needed to manage real-world data complexity – IBM has proprietary tools to help you achieve this.

IBM bridges the gap. The company offers an open source-style starting point, adding tools and services that allow you to reach production.

You can continue to use most of your preferred open-source tools, but you also have an easy-to-use environment in which to share your model with all the data stakeholders, including business experts, and easy-to-use production tools that allow you to deliver a working solution that’s robust, secure and easy to maintain and update.

We need to make a change now

The whole planet is going through a transformation that’s being realised partially through technology and partially in ourselves. If you are a developer and want to be part of this big change, consider joining the IBM Call for Code Global Challenge 2021.

The Challenge is a great way to improve Earth’s health and increase the scope of your future. Join the IBM Call for Code Challenge now!

More information will be delivered online during the Data and AI Forum Italy, an event dedicated to data centrality inside the data transformation journey of Italian organizations. Create your free IBM Cloud account here.

The IBM AI ladder is a multi-step information architecture

The proposed IBM architecture brings all the data value chain elements together on the same platform.

Cloud Pak for Data is based on a 4-step model: collect, organize, analyze, and infuse. Some of its services are included in the Watson Studio solution portfolio. You can find more information on the so-called AI ladder by consulting this technical article.

Image portraing IBM's offer for a full-cycle solution to develop AI/ML processes that are ready for production.
The architecture of Cloud Pak for Data

Data and digital transformation

The environment as a whole is rich with modules and options. Data scientists will feel at home in the analysis area. The platform makes it simple to explain the contribution Cloud Pak for Data can make to the production process to those whose expertise is in business, rather than data.

All components can be found on a single, consistent platform, and any contribution can be considered. The businessman can understand the data analyst, and most technical developers can complete the full cycle. Many languages can be used simultaneously (Python, Scala, R…), so every contribution is added to the available tools. The user interface is easy – it will become familiar in minutes – and has different levels: viewer, developer, or analyst – each has a place within the platform.

This set-up means that users can have the best of all worlds and can clearly understand the implications of each contribution: the security expert contributes his experience in a way that can easily be understood by other technical experts, for whom security may not be their strongest suit. The same applies to governance and other relevant aspects.

All of this expertise is available to all team members and will be ready and waiting when a problem arises.

Image showing the rich toolset for data scientists which includes high-profile OS tools such as Torch, Tensorflow, Keras, SparkML, Scikit learn, and so on..
Watson Studio tools and their relationship with Cloud Pak for Data

Cloud Pak for Data as a Service 

Cloud Pak for Data is a comprehensive platform that houses many services, Watson Studio among them.

Cloud Pak for Data provides users with an integrated set of capabilities for collecting and organizing data into a trusted, unified view and the ability to create and scale AI models across your business.

Cloud Pak for Data as a Service includes these features:

  • Streamlined administration:
    • No installation, management, or updating of software or hardware
    • Easy to scale up or down
    • Secure and compliant
    • A subscription with a single monthly bill
  • Integrated experience for working with data:
    • Connect to and catalog data sources on any Cloud
    • Provision, populate, and use a governed data lake
    • Run an end-to-end data science lifecycle
    • Access AI services to transform customer interactions

Architecture of Cloud Pak for Data as a Service

Cloud Pak for Data as a Service provides a single, unified interface for a set of core services and their related services.

Image of IBM Watson Studio, which is part of the Cloud Pak for Data as a Service portfolio.
The architecture of Cloud Pak for Data as a Service


With Cloud Pak for Data as a Service, you can create these types of services from the integrated services catalogue:

  • Core services to govern data, analyze data, run and deploy models
  • Services that supplement core services by adding tools or computation power
  • IBM Cloud database services to store data for use in the platform
  • Watson OpenScale, Watson Assistant, and other Watson services that have their own UIs or provide APIs for analyzing data

The sample gallery provides data assets, notebooks, and projects. Sample data assets and notebooks provide examples of data science and machine learning code. Sample projects contain a set of assets and detailed instructions on how to solve a particular business problem.
Integrations with other Cloud platforms can be configured so that users can easily create connections to data sources on external platforms.

Users can create connections to other Cloud data sources or on-premises databases to work with data without moving it.

This illustration shows the functionality included in the common platform, the core services, and the supplementary services.

Diagram of Watson Studio, Watson Knowledge Catalog, and Watson Machine Learning services listed with the common platform functionalities.
Watson Studio services in greater detail.

The following functionality is provided by the platform:

  • Administration at the account level, including user management and billing
  • Storage for projects, catalogs, and deployment spaces in IBM Cloud Object Storage
  • Global search for assets and artifacts across the platform
  • Platform assets catalog for sharing connections across the platform
  • Role-based user management within collaborative workspaces across the platform
  • Common infrastructure for assets, projects, catalogs, and deployment spaces
  • A services catalog for provisioning additional service instances

Watson Studio provides the following types of functionality in projects:

  • Tools to prepare, analyze, and visualize data, and build models
  • Environment definitions to provide compute resources

Watson Machine Learning provides the following functionality:

  • Tools to build models in projects
  • Tools to deploy models and manage deployed models in deployment spaces
  • Environment definitions to provide compute resources

Watson Knowledge Catalog provides the following functionality:

  • Catalogs to share assets
  • Governance artifacts to control and enrich catalog assets
  • Categories to organize governance artifacts
  • Tools to prepare data in projects

Watson Studio Tools

This slide provides just a hint of the scope of Watson Studio Tools. The starting webpage is located here. As a data scientist, you may work on a project where the quantity of data is impossible to manage with traditional Python libraries such as Pandas.

Cloud services give you a data machine that can work in parallel on data: Spark, a solution integrated into the Watson Studio version on the Cloud, will help you do this. Using Spark, your solution can be executed in a limited time, even if the same data set would require a very long processing time if you used normal Python tools.

A graphic representation of Auto AI, SPSS Modeler, Data refinery flow, decision optimization which are the main tools included in IBM Watson environment.
IBM Watson Studio tools at a glance.

Let’s subscribe to Watson Studio!

The cloud service page has a header. If you start from Watson Studio and use only one service, you will see the header. If you add any other Cloud Pak for Data service, the page header becomes your own Cloud Pak for Data.

You can go to this page to join Watson Studio for free. The page automatically appears in your language area.

A picture of IBM Watson Studio join page with a free account.
IBM Watson Studio join page.

Access many IBM services including Watson Studio here.

Starting a new project in CP4Daas

A screenshot of IBM Cloud Pak for Data project Page.
IBM Cloud Pak for Data project Page

GitHub feels like a good place to be right now. There is a shortlist of direct commands to be performed:

  • Enter the project;
  • Enter the GitHub page
  • Create a repository
  • Create a data asset
  • Create a data set
  • Create models.

At this point, it becomes clear that you are no longer in the GitHub-based world but have joined a larger world that will help you to develop the best solution.

Image: Adding collaborators in IBM Cloud Pak for Data
Adding collaborators in IBM Cloud Pak for Data: any new collaborator can be a viewer, an editor, or an administrator.

As a technician, I access CP4D in the overview mode. I see roles and tools for:

  • Roles
  • Governance

I am the admin, and can add new people:

  • Admin
  • Editor for code
  • Viewer to see things
  • Write a readme file or diary.
Screenshot of IBM Cloud Pak for Data assets page.
You always have an overview of your data assets at a glance. The “Bankrupt data” file will be used in the next example.

The asset list can be shown. Environments are among these assets.

Image of IBM Cloud Pak for Data environments page.
Python, Scala, Spark are some common options for your execution environment.

You can easily develop your model, making the best choices for your project in the process. This means that you can show a complete project to any project stakeholders, from team members to clients or prospects, and exploit many options.

Image: IBM Cloud Pak for Data new Environment definition
Environments are defined by your specific needs in that project
Screeshot of IBM Cloud Pak for Data's execution tokens to run a model
Your executions will consume resources that are reported into your account.


The light plan is on the CP4D page. You are given a certain quantity of resources through a certain number of tokens, called Containerized Housing Units or CHUs. You have a limited amount of CHUs in your free plan: you can buy more, should you need them.

Data upload

It is now time to upload data to test your model.

A screeshot of The Data assets page from IBM Cloud Pak for Data .
Your data assets are listed on the appropriate page. We will choose one from this example.

You have a list of data assets. You have to share one of these datasets.

This is a straightforward task to achieve; simply drag and drop on the column on the far right hand side, named “Data”. You can also upload datasets from the desktop of your computer.

The Data column has three tags: “Load”, “Files”, and “Catalog”.

Being competitive with code is important these days, but the specific language or code used can change over time. Many of the tasks currently assigned to Python coding will be accomplished using Scala in the near future. Large quantities of tasks will be coded automatically for security and integration purposes. Rich platforms will become the choice of reference.

Returning to data, in this example we select the file named Bankruptcy in CSV data. You can see the content without opening the programming notebook. This is one of the simplified options IBM Cloud Pak offers. Many other tasks are performed directly, to avoid opening a notebook just for these small operations.

A screenshot of IBM Cloud Pak and Data exploring data sets.
Many simplified options are easily available inside the extended notebook.

CSV data are categorized without giving commands inside a notebook.

You can easily access a lot of information, such as data types (string, not numbers).

You can then create a job, something that can be relaunched as a function: you see this in action here.

A screenshot of IBM The Data Refinery tools.
Some of the most frequent operations can be performed directly.

You can have access to many of these data options even if you are not a skilled Python programmer. In most cases, expertise is unnecessary and basic programming knowledge is enough.

What’s more important is that a skilled person in a specific field – such as finance in our example – can make an important contribution. We are dealing with financial data in our “bankruptcy” file example, so an accountant can easily discover all the incorrect data categories a programmer might overlook.

Data analysis

Data pre-processing can be done in a few simple steps. We’re not dealing with this subject here, so let’s assume it’s all ready to go.

Now you must create your notebook.

You are still on the Asset page in the Cloud Pak for the data framework. By clicking on the blue button, the “Choose asset type” window appears.

Screenshot of IBM Cloud Pak and Data showing  how to choose the preferred notebook options.
Asset types are prompted for you to make a choice of development environment.

The first thing to think about is the notebook option list. Somebody in the team has already worked on the notebook for this example. You can see all the specifications now.

Keep the processing costs in mind. Previews consume no CHUs, so you can get free advice from all data stakeholders.

You can also load models, monitor them, and even delete some of them without ever leaving the notebook – a really important feature.

Looking at the proposed code inside the notebook, you can perform some important jobs, such as checking what kind of libraries are used and in which version (in Python this can easily lead to confusion!).

In this example, we see that scikit-learn framework has been used, and the related version is 0.23. Then, you have to check all possible compatibility issues.

Scikit-learn framework, version 0.23
Checking the code provides you with very useful execution information. 
The scikit-learn framework, version 0.23, has been used in this example.
An image showing automatic code generation.
Data import, or file import, can be safely inserted into the solution code by the automatic generation options.

Some lines of data can be automatically generated while importing data or files.

IBM Cloud Pak and Data: code writing.
Code writing is used when needed. ecurity and compatibility issues are often solved by design.

The proposed lines of code are written automatically. The best solution is to load all rules onto an automatic coder that will write a compatible and secure code.

You can also take a previous handwritten code with a different framework or library, analyse it, and let the automatic code generator write the missing code lines, which will allow the old code to run on the new notebook executed inside Watson Studio or Cloud Pak for Data.

Bringing models into production

Deployment time.  All projects and teams are represented on a single page.

Now it’s time to input the API key (generated automatically from the IBM Cloud website).

The API Key is generated inside Watson Studio.

Specify a resource location: Frankfurt, in this case. 

You can now manage the contents of this model. If it performs well, you can easily manage the environment by bringing it to the “Deployment Space” area, a repository of models.

To recap our earlier actions, we analyzed the CSV file named Bankruptcy and created a related model. This model has been named XBG_Bankruptcy and has been loaded inside the deployment space. It is now ready to be used through API calls.

The deployment space gives us many different ways to call it at a glance: a cUrl, Java, Javascript, Python, and Scala code. We have to compile it with the API key, add the model specifications, and the model is ready!

Let’s call our model!  cUrl, Java, Javascript, Python, and Scala code are all allowed.

Conclusions

Only a data scientist understands all the modeling phases when open source tools are being used. Value nonetheless increases if you can engage business people through a clear process.

Security, scalability, and governance are problems to be solved. Thanks to the CPDaaS platform, and all the Watson Studio suite tools, all issues are tackled without dealing with a specific ICT, therefore dealing with the direct components only.

Let’s dive into code!

As previously stated, more information will be delivered online during the Data and AI Forum Italy. This event is dedicated to data centrality inside the data transformation journey of Italian organizations.

Above all, let me draw your attention once again to the IBM Call for Code Challenge. This Challenge is a great way to contribute to improving the planet’s health and to broaden the scope of your future. Create your free IBM Cloud account here to discover more.

Related Posts

Smart Mobility Hitachi, 360,

Smarter Mobility: A Data-Driven Approach to Modern Public Transportation

Codemotion
November 5, 2024
Databricks and python. A complete guide for data dominance by Federico Trotta. Data science

Python and Databricks: A Dynamic Duo for Data Dominance

Federico Trotta
August 28, 2023
analisi dati pandas

Data Analysis Made Easy: Mastering Pandas for Insightful Results

Federico Trotta
July 26, 2023
Python vs Julia: which to choose to kickstart your career in data science

Getting Started with Data Science: Python vs Julia

Codemotion
April 12, 2023
Share on:facebooktwitterlinkedinreddit

Tagged as:Data Analysis IBM

Leo Sorge
I hold a degree in electronics. I talk and write about science and technology in both real and close-to-real worlds since 1976. I frankly believe that business plan and singularity are excellent starts for science-fiction stories.
Clean Water and Sanitation: Data Analysis to Spread Awareness about the Most Precious Resources for Life
Previous Post
How Augmented Reality (AR) Enables Remote Troubleshooting
Next Post

Footer

Discover

  • Events
  • Community
  • Partners
  • Become a partner
  • Hackathons

Magazine

  • Tech articles

Talent

  • Discover talent
  • Jobs

Companies

  • Discover companies

For Business

  • Codemotion for companies

About

  • About us
  • Become a contributor
  • Work with us
  • Contact us

Follow Us

© Copyright Codemotion srl Via Marsala, 29/H, 00185 Roma P.IVA 12392791005 | Privacy policy | Terms and conditions