How do you create a custom pipeline in the `transformers` library?

Q: How do you create a custom pipeline in the `transformers` library?

Learn the answer to "How do you create a custom pipeline in the `transformers` library?" with detailed explanations, code examples, and best practices on DeployU.

The Scenario

You are working on a new NLP task that is not supported by any of the existing pipelines in the transformers library. You need to create a custom pipeline that can perform this task.

The task is to classify the sentiment of a movie review as either “positive” or “negative”, but you also want to return the confidence score of the prediction as a percentage.

The Challenge

Explain how to create a custom pipeline in the transformers library. What are the key methods that you need to implement in a custom pipeline class?

Wrong Approach

A junior engineer might not be aware that it's possible to create a custom pipeline. They might try to implement the entire inference logic from scratch, which would be time-consuming and would not be well-integrated with the `transformers` library.

Addresses symptoms, not root cause

Right Approach

A senior engineer would know that the `transformers` library is designed to be extensible and that it is easy to create a custom pipeline. They would be able to explain the key methods that need to be implemented in a custom pipeline class and would have a clear plan for how to do it.

Step 1: Subclass the `Pipeline` Class

The first step is to create a new class that inherits from the transformers.Pipeline class.

from transformers.pipelines import Pipeline

class SentimentAnalysisWithScore(Pipeline):
    # ...

Step 2: Implement the `_sanitize_parameters` method

This method is used to sanitize the parameters that are passed to the pipeline. It should return a dictionary of sanitized parameters.

    def _sanitize_parameters(self, **kwargs):
        # ...
        return preprocess_params, {}, postprocess_params

Step 3: Implement the `preprocess` method

This method is used to pre-process the input data. It should take the input data as an argument and return a dictionary of tensors that can be fed to the model.

    def preprocess(self, inputs, **kwargs):
        # ...
        return self.tokenizer(inputs, return_tensors=self.framework)

Step 4: Implement the `_forward` method

This method is used to perform the forward pass of the model. It should take the output of the preprocess method as an argument and return the output of the model.

    def _forward(self, model_inputs):
        # ...
        return self.model(**model_inputs)

Step 5: Implement the `postprocess` method

This method is used to post-process the output of the model. It should take the output of the _forward method as an argument and return the final output of the pipeline.

    def postprocess(self, model_outputs, **kwargs):
        # ...
        return {"label": label, "score": score}

Step 6: Putting it all together

Here is the complete code for our custom pipeline:

from transformers.pipelines import Pipeline

class SentimentAnalysisWithScore(Pipeline):
    def _sanitize_parameters(self, **kwargs):
        return {}, {}, {}

    def preprocess(self, inputs, **kwargs):
        return self.tokenizer(inputs, return_tensors=self.framework)

    def _forward(self, model_inputs):
        return self.model(**model_inputs)

    def postprocess(self, model_outputs, **kwargs):
        logits = model_outputs.logits[0]
        probs = logits.softmax(dim=-1)
        score = probs.max().item()
        label = self.model.config.id2label[probs.argmax().item()]
        return {"label": label, "score": f"{score*100:.2f}%"}

# Use the custom pipeline
my_pipeline = SentimentAnalysisWithScore(model=my_model, tokenizer=my_tokenizer)
my_pipeline("I love this movie!")
# {'label': 'POSITIVE', 'score': '99.98%'}

Systematic, production-ready debugging

Practice Question

You want to add a new argument to your custom pipeline that can be used to control the behavior of the `postprocess` method. Where would you define this argument?

Questions

The Scenario

The Challenge

Step 1: Subclass the Pipeline Class

Step 2: Implement the _sanitize_parameters method

Step 3: Implement the preprocess method

Step 4: Implement the _forward method

Step 5: Implement the postprocess method