I am loading a flat file to a table but I also need to scrub the data a bit before the data hits the table. The main update required is converting a dot (.) character to a null value. The source file is using this character to indicate a blank. I know I can use the Dervived Column Transformation, but I have quite a few columns which will take a while to manually configure. Is there another transformation option that anyone can point me to?
Thanks
Under this scenario, is that the only value in the column? Or are you searching/replacing (.) with NULLs?|||I am using a conditional evaluation in the derived column transformation:
[Coulmn1] == "." ? (DT_STR,50,1252)NULL(DT_STR,50,1252) : [Column1]
The columns would never have a value that contains a dot, only values or the dot
|||
crancilio wrote:
I am loading a flat file to a table but I also need to scrub the data a bit before the data hits the table. The main update required is converting a dot (.) character to a null value. The source file is using this character to indicate a blank. I know I can use the Dervived Column Transformation, but I have quite a few columns which will take a while to manually configure. Is there another transformation option that anyone can point me to?
Thanks
Not that I am awre of. I am afraid you have to do the same thing for every column affected by that logic. Perhaps, you could try to use an script component where you could use the magic of copy and paste...
|||Let's use a script component instead of a derived column. You'll be very happy with the following solution:Instead of the derived column, add a script component, set it to be a transformation.
Select the columns you wish to work with. Set their usage types to "ReadWrite." ONLY select the columns you wish to process with this logic.
Below is the script. I don't understand it fully, and I've hacked something that our forum user, jaegd wrote:
Code Snippet
Imports System
Imports System.Data
Imports System.Math
Imports System.Text
Imports System.Collections.Generic
Imports Microsoft.SqlServer.Dts.Pipeline
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
'Note: this code was originally written/posted by the SSIS forum user, jaegd. http://forums.microsoft.com/MSDN/User/Profile.aspx?UserID=133544&SiteID=1
'Credit has been given where credit is due
'Original post: http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=864401&SiteID=1
Public Class ScriptMain
Inherits UserComponent
Private inputBuffer As PipelineBuffer
Private cols As Dictionary(Of Int32, ColumnInfo) = New Dictionary(Of Int32, ColumnInfo)
Private currentColumnInfo As ColumnInfo = New ColumnInfo
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
'Setup control counter
Dim counter As Integer = 0
'Loop through segments
'MsgBox(currentColumnInfo.colIndex.ToString)
For Each currentcolumn As KeyValuePair(Of Int32, ColumnInfo) In cols
'MsgBox(inputBuffer.GetString(currentcolumn.Key))
If (inputBuffer.GetString(currentcolumn.Key)) = "." Then
inputBuffer.SetString(currentcolumn.Key, Chr(0))
End If
Next
End Sub
Public Overrides Sub ProcessInput(ByVal InputID As Integer, ByVal Buffer As Microsoft.SqlServer.Dts.Pipeline.PipelineBuffer)
' Get the Pipeline Buffer for subsequent ordinal column access
inputBuffer = Buffer
MyBase.ProcessInput(InputID, Buffer)
End Sub
Public Overrides Sub PreExecute()
BuildColumnDictionary()
End Sub
Private Sub BuildColumnDictionary()
Dim indexes() As Integer
Dim input As IDTSInput90
Dim col As IDTSInputColumn90
Dim offset As Integer = 0
input = Me.ComponentMetaData.InputCollection(0)
'presumes GetColumnIndexes order matches iterator order
'as BufferManager is not available to my knowledge in ScriptComponent
indexes = Me.GetColumnIndexes(input.ID)
For Each col In input.InputColumnCollection
Dim columnStructure As New ColumnInfo
With columnStructure
.colName = col.Name
.colLength = col.Length
.colIndex = indexes(offset)
'Normally, BufferManager would be used, but its not exposed in Script Component
.colPrecision = col.Precision
.colScale = col.Scale
.colType = col.DataType
End With
cols.Add(indexes(offset), columnStructure)
offset += 1
Next
End Sub
Public Structure ColumnInfo
Dim colName As String
Dim colType As DataType
Dim colIndex As Int32
Dim colLength As Int32
Dim colPrecision As Int32
Dim colScale As Int32
End Structure
End Class
A screenshot of the results: HERE|||Note: I would LOVE it if someone would come in and simplify the code above. I just don't have the understanding of the SSIS programming model to do anything BUT hack code together. Not yet anyway.
Also, the work happens in the Input0_ProcessInputRow sub. Periods are replaced with Chr(0) which is null. Give it a shot, tweak it to however you need, etc...
You may want to trim() the columns first, before going into this transformation.|||This is great - thanks! I will let you know how it goes|||
Phil - thank you so much for your help!
This code was exactly what I was looking for. The only tweak I had to make was to use SetNull() instead of SetString() - the Chr(0) actually added an empty string rather than a null.
Thanks again!
|||
Phil Brammer wrote:
Note: I would LOVE it if someone would come in and simplify the code above. I just don't have the understanding of the SSIS programming model to do anything BUT hack code together. Not yet anyway. Also, the work happens in the Input0_ProcessInputRow sub. Periods are replaced with Chr(0) which is null. Give it a shot, tweak it to however you need, etc...
You may want to trim() the columns first, before going into this transformation.
How's this?
Note - this is really only useful for generically accessing each column, and has no type safety, so the code only works for string columns. jaegd's original code provided a lot more information about the columns, including type information, which would allow you to add conditional logic to use the appropriate accessor (GetString, GetInt32, etc). I've also used the System.Reflection to access the buffer, but I need to test the performance a bit more before pushing that as a solution.
Code Snippet
'Note: this code was originally written/posted by the SSIS forum user, jaegd. http://forums.microsoft.com/MSDN/User/Profile.aspx?UserID=133544&SiteID=1
'Credit has been given where credit is due
'Original post: http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=864401&SiteID=1
'Trimmed to smaller set of code by jwelch
Imports Microsoft.SqlServer.Dts.Pipeline
PublicClass ScriptMain
Inherits UserComponent
Private inputBuffer As PipelineBuffer
PublicOverridesSub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Dim counter AsInteger = 0
For counter = 0 To inputBuffer.ColumnCount - 1
If (inputBuffer.GetString(counter)) = "."Then
inputBuffer.SetString(counter, Chr(0))
EndIf
Next
EndSub
PublicOverridesSub ProcessInput(ByVal InputID AsInteger, ByVal Buffer As Microsoft.SqlServer.Dts.Pipeline.PipelineBuffer)
' Get the Pipeline Buffer for subsequent ordinal column access
inputBuffer = Buffer
MyBase.ProcessInput(InputID, Buffer)
EndSub
EndClass
|||
jwelch wrote:
How's this?
Yep, I like that MUCH better. Knew jaegd had much more stuff in there, I just didn't quite know where to begin to trim it down... Ran out of time too... The golf course beckoned.
No comments:
Post a Comment