Questions
How do you create a custom pipeline in the `transformers` library?
The Scenario
You are working on a new NLP task that is not supported by any of the existing pipelines in the transformers library. You need to create a custom pipeline that can perform this task.
The task is to classify the sentiment of a movie review as either “positive” or “negative”, but you also want to return the confidence score of the prediction as a percentage.
The Challenge
Explain how to create a custom pipeline in the transformers library. What are the key methods that you need to implement in a custom pipeline class?
A junior engineer might not be aware that it's possible to create a custom pipeline. They might try to implement the entire inference logic from scratch, which would be time-consuming and would not be well-integrated with the `transformers` library.
A senior engineer would know that the `transformers` library is designed to be extensible and that it is easy to create a custom pipeline. They would be able to explain the key methods that need to be implemented in a custom pipeline class and would have a clear plan for how to do it.
Step 1: Subclass the Pipeline Class
The first step is to create a new class that inherits from the transformers.Pipeline class.
from transformers.pipelines import Pipeline
class SentimentAnalysisWithScore(Pipeline):
# ...Step 2: Implement the _sanitize_parameters method
This method is used to sanitize the parameters that are passed to the pipeline. It should return a dictionary of sanitized parameters.
def _sanitize_parameters(self, **kwargs):
# ...
return preprocess_params, {}, postprocess_paramsStep 3: Implement the preprocess method
This method is used to pre-process the input data. It should take the input data as an argument and return a dictionary of tensors that can be fed to the model.
def preprocess(self, inputs, **kwargs):
# ...
return self.tokenizer(inputs, return_tensors=self.framework)Step 4: Implement the _forward method
This method is used to perform the forward pass of the model. It should take the output of the preprocess method as an argument and return the output of the model.
def _forward(self, model_inputs):
# ...
return self.model(**model_inputs)Step 5: Implement the postprocess method
This method is used to post-process the output of the model. It should take the output of the _forward method as an argument and return the final output of the pipeline.
def postprocess(self, model_outputs, **kwargs):
# ...
return {"label": label, "score": score}Step 6: Putting it all together
Here is the complete code for our custom pipeline:
from transformers.pipelines import Pipeline
class SentimentAnalysisWithScore(Pipeline):
def _sanitize_parameters(self, **kwargs):
return {}, {}, {}
def preprocess(self, inputs, **kwargs):
return self.tokenizer(inputs, return_tensors=self.framework)
def _forward(self, model_inputs):
return self.model(**model_inputs)
def postprocess(self, model_outputs, **kwargs):
logits = model_outputs.logits[0]
probs = logits.softmax(dim=-1)
score = probs.max().item()
label = self.model.config.id2label[probs.argmax().item()]
return {"label": label, "score": f"{score*100:.2f}%"}
# Use the custom pipeline
my_pipeline = SentimentAnalysisWithScore(model=my_model, tokenizer=my_tokenizer)
my_pipeline("I love this movie!")
# {'label': 'POSITIVE', 'score': '99.98%'} Practice Question
You want to add a new argument to your custom pipeline that can be used to control the behavior of the `postprocess` method. Where would you define this argument?